This page covers basic usage of / working with Linux.

A great resource are the man pages but also the [[https://www.gnu.org/software/coreutils/manual/html_node/index.html|official documentation of the GNU Coreutils]]


====== Getting Help ======

== type ==

''type'' is a builtin command of bash, that shows the type of a command, e.g. binary, script, alias, builtin command,..
Depending on the result you must use the respective help system.

<code bash>
type ls        # show type of ls
type -a kill   # show all commands with the name kill (=one program & one builtin!)
</code>

== help ==

''help'' is a bash builtin that gives you help for bash builtins (try ''help help'').

<code bash>
help           # list all builtin bash commands
help type      # get help for builtin command 'type'
</code>

== man, whatis, apropos ==

''[[http://www.mcmcse.com/linux/man_pages.shtml|man]]'' shows you the man(ual) page for nearly every Linux program but also config files or system calls. Man pages are divided in sections, the most important ones are programs (1), config files (5) and sysadmin tools (8).

Man pages are preprocessed with the program ''groff'' and displayed using the program ''less''. Use the arrow keys to navigate, type ''/searchphrase'' and enter to search, and type ''q'' to quit.

<code bash>
man man        # view man page of man itself
man passwd     # view man page of programm passwd (section 1)
man 5 passwd   # view man page of /etc/passwd (section 5)
</code>

''whatis'' shows shows man pages whose names contain your search phrase.

<code bash>
whatis
man -f
man -aw #show full path to man file instead of name only
</code>

''apropos'' shows man pages relevant to a topic by searching through both the "Name" and "Description" sections of the man page database.

<code bash>
apropos
man -k
</code>

== info ==

''info'' shows info page for many GNU programs. Sometimes ''info'' gives more in-depth information than ''man'' (e.g. info ls). Type ''h'' in ''info'' to get a help on how to navigate with the keyboard.

<code bash>
info
</code>

== whereis, which ==

''whereis'' shows the paths of a program, its config files and man pages.
''which'' only shows the full path of a program

<code bash>
whereis ls
which ls
</code>

====== Bash ======

''bash'' is a commonly used shell a.k.a. command-line interpreter a.k.a. terminal.
It is used to run programs, scripts, and basically interact with the system.

[[linux:bash|Here configuration, basic usage and bash scripting is explained.]]


====== Files and Directories ======

===== Exploring the File System =====

== cd - change directory ==
<code bash>
cd /etc/init.d                  # change into directory /etc/init.d
cd  or  cd ~                    # change into home directory
cd ..                           # change into parent directory
cd -                            # change into last directory you visited before
</code>

Note, that this is a shell builtin, not a program.

== ls - list directory contents ==

<code bash>
## list files, check free / used disk space
ls [<dir>]                      # list files (in dir)
ls -lah [<dir>]                 # list *all* files in *long format* (permissions, owner, size in *human readable format*,..)
ls -ld [<dir>]                  # info for directory itself
</code>

By default ''ls'' sorts files by name in ascending order, this can of course be changed using either short or long forms:

<code bash>
ls -lr                          # reverse order
ls -lS                          # ls -l --sort=size (biggest first)
ls -lX                          # ls -l --sort=extension (ascending)
ls -lt                          # ls -l --sort=time (modification time, newest first)
ls -lut                         # ls -l --sort=time --time=atime (last access time, newest first)
ls -lct                         # ls -l --sort=time --time=ctime (creation time, newest first)
</code>

''ls'' can also be used to search in the current directory. You can of course make use of bash wildcards here.
<code bash>
ls *.txt                        # show all text files in current directory
ls -d *dir*                     # show all directories in current directory containing 'dir'
</code>

=== Disc Space ===

== du - disc usage ==
''du'' always calculates the size of a directory recursively. The options limit verbosity of how many lines are printed, but the totals do not change.

<code bash>
du                              # recursively print disc usage of all directories below the current directory (+ the current directory)
du -sh <file>                   # human readable summary of total disk usage, or current directory if no file is given
du -c <file1> <file2>           # grand total is displayed in addition to usage for two directories
du -h -d <depth>                # total disk usage dir and subdirs (show usage for each subdir as well)
du --exclude=*.mp3              # disk usage except for mp3s
du -a                           # also print usage for files
du -s                           # only print a summary-line (disc usage of current directory)
</code>

== df - disc free ==
''df'' is (kind of) the opposite of ''du'', it shows remaining size (or inodes) on a partition.

<code bash>
df                              # show free space on all mounted filesystems
df -h                           # use human-readable format
df -i                           # show used / free nr of inodes
df -h <file>                    # show free space on the file system where file lies
</code>

== GUI approaches ==

Great graphical insights into disc usage are given by programs that analyze a whole directory tree and let you easily identify files or directories that take up the most space:

''ncdu'' is a text-based (ncurses) program, ''filelight'' a KDE-based GUI program and ''baobab'' aka ''Disk Usage Analyzer'' is the GNOME equivalent.
===== File Handling =====

== file - determine file type ==

This tool determines the file type (and encoding for text files) by analyzing its contents.
<code bash>
file myfile                # print file type
file -i myfile             # print MIME type
file -bi myfile            # print MIME type (without prepending file name)
</code>

== touch, mkdir - create ==

<code bash>
touch file.txt             # creates an empty file using touch
> file.txt                 # creates an empty file using bash redirection
mkdir directory            # creates an empty directory
mkdir -p parent/subdir     # create the whole directory tree (p = parent)
</code>

Actually, ''touch'' was written to change times (by default: atime and mtime - access & modification time) of a file. By default it also creates files if they do not exist yet, the parameter ''-c'' suppresses file creation.
<code bash>
touch file.txt                                      # set atime and mtime of file.txt to current time
touch --date="2012-05-01 18:12" file.txt            # set atime and mtime of file.txt
touch --time=atime --date="2012-05-01" file.txt     # only set atime of file.txt
</code>

== cp - copy ==
<code bash>
cp srcfile targetfile      # copy files (dereferences symlinks automatically!)
cp -r srcdir targetdir     # resursively copy directories (dereferences symlinks automatically!)
cp -a  or cp -Rpd          # archive (backup) files: recursive, preserve symlinks, owner, timestamps, mode
</code>

Some important parameters:
  * ''-L'': always dereference symlinks
  * ''-P'': do not dereference symlinks
  * ''-d'': do not dereference and preserve symlinks
  * ''-p'': preserve owner, timestamps, mode (permissions)
  * ''-R'' or ''-r'': recursive copy
  * ''-u'': update - copy only if source file is newer than destination or missing

== mv - move (or rename) ==

The program ''mv'' moves files or directories from one location to another. If both locations are on the same partition, it is a simple rename, otherwise the content is copied and deleted.

<code bash>
mv src target              # move a file or directory. 
</code>

When moving a file to an existing target that is also a file the target is overwritten without any notice. If the existing target is a directory, the file or directory to be moved is moved into the directory. It is therefore advisable to be careful when using ''mv'' with files.

<code bash>
mv -i srcfile targetfile   # interactive mode (asks if a file should be overwritten)
mv -n srcfile targetfile   # noclobber mode (does not overwrite files - but also does not print a message)
</code>

A practical tool for bulk renaming is the perl program ''rename''. There you can use the full power of perl regular expressions. [[http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal|More examples.]]
<code bash>
rename -n s/old/new/ *           # dry-run of what would happen when renaming all files and directories in the current directory
rename -v s/old/new/ *           # actually rename and print all changes
rename 's/\.htm$/\.html/' *.htm  # complex example: rename all .htm files to .html
</code>

== ln - link ==

To make it easier to remember: as with ''cp'', the first argument is the part that already exists.

<code bash>
ln src target  or  cp -l src target      # create a hardlink (both file reference the same inode)
ln -s src target  or  cp -s src target   # create a symbolic link (symlink)
</code>

== rm, rmdir - remove ==

<code bash>
rm file.txt                 # remove a file
rm -r directory             # recursively remove a directory
rmdir directory             # remove an empty directory
rmdir -p parent/sub         # remove an empty directory tree
</code>

==== Secure File Handling ====

== scp: secure copy ==

''scp'' can copy files between hosts (also between two remote hosts)

Recursively copy ''localDir'' to a remote host:
<code>
scp -r localDir user@host:remoteDir 
</code>

To copy between two remote hosts either the source host must have credentials to log into the target host or you use the issuing host as third party via ''-3'' (recommended).
<code>
scp -3 -r user1@host1:remoteDir2 user2@host2:remoteDir2
</code>

== rsync: copy / synchronize directories locally and remotely ==

''rsync'' can keep two directories synchronized intelligently by only syncing changes, which makes it useful for e.g. remote backups.

Simple local usage, where ''localDir'' and all its contents will be copied to ''otherLocalDir/localDir'':
<code bash>
rsync -a localDir otherLocalDir
</code>

When copying remotely by default ''ssh'' is used. Quick setup:
<code bash>
# @destination (where files are copied to)
# set up ssh server (listens for incoming connections)
sudo apt-get install openssh-server

# @source
ssh-keygen
ssh-copy-id user@remotehost
rsync -av localdirectory user@remotehost:directory
</code>

[[https://help.ubuntu.com/community/rsync|More on]] [[https://wiki.ubuntuusers.de/rsync|this topic]]
===== File Permissions and Attributes =====

Each file in Linux has
  * a type
  * an owner
  * a group (similar to owner)
  * permissions (can owner/group/others read/write/execute?)
  * attributes

Note, that for ''root'' can not only change file permissions but they are also do not apply (except for the execution of files)! E.g. ''root'' can can always write/delete write-protected files or create files in in a write-protected directory. To protect files / directories from unwanted changes by ''root'' attributes can be used.

The output of ''ls -l'' e.g. looks like ''drwxrwxrwx''. The first character identifies the file type, then there follow three access-tuples (for owner, group, others).

List of possible values for the file type (first character):
  * - file
  * d directory
  * l symlink
  * c character device (e.g. /dev/random)
  * b block device (e.g. /dev/sda1)
  * p fifo
  * s socket

== chown - change owner ==
With ''chown'' both the owner and the group can be changed. Simple usage examples:

<code bash>
chown myuser file.txt           # change the owner of file.txt to 'myuser'
chown myuser:mygroup file.txt   # change the owner of file.txt to 'myuser' and the group to 'mygroup'
chown :mygroup file.txt         # only change the owner of file.txt to 'mygroup'
chown -R myuser directory       # change the owner of directory recursively to 'myuser'
</code>

If a symbolic link is given, by default the referenced file is changed, not the link itself. Use ''-h'' to avoid dereferencing and to change the owner of the symlink itself.

In recursive mode symbolic links not traversed by default (''-P''). With ''-L'' every encountered symlink can be traversed. (FIXME ausprobieren)

To only change the owner for files owned by a certain user and/or group use the ''from'' option with the ''user:group'' syntax:
<code bash>
chown -R --from=myuser:mygroup otheruser:othergroup directory
</code>

[[http://www.thegeekstuff.com/2012/06/chown-examples|More examples]]

== chgrp - change group ==

FIXME

== chmod - change permission ==

A combination of the letters **ugoa** controls which users' access to the file will be changed:
  * u the user who owns it
  * g other users in the file's group
  * o other users not in the file's group
  * a all users

The letters **rwxXst** select file mode bits for the affected users:
  * r read
  * w write
  * x execute (or search for directories)
  * X execute/search only if the file is a directory or already has execute permission for some user
  * x set user or group ID on execution
  * t restricted deletion flag or sticky bit

The letter for affected users and the ones for the mode can be combined with one of ''=+-''. (equals, add, remove) 

<code bash>
chmod o+rw file    # add permissions for other (everybody) to read and write file
</code>

Explicit setting of all permissions is also possible in four octal numbers (e.g. 0644 or 0755 are common). The first represents the suid/sgid/sticky bits, the other three permissions for ugo. 
  * number 1: read=4, write=2, execute=1
  * numbers 2-4: suid=4, sgid=2, sticky=1

<code bash>
chmod 0644 file    # do not set suid/sgid/sticky bit but allow owner rwx and group/other rw
</code>

Some advanced examples:
<code bash>
chmod u+srw,g+r file  or  chmod 4740 file # set suid, read and write access for user,
                                          # only read acces for group, nothing for other
chmod -R a+awX <dir>   # recursively give read/write access to a directory tree 
                       # (executable bit is only set for directories - and files where
                       # one of ugo already has the executable bit set)
</code>

== chattr - change attributes ==
<code bash>
chattr - change file attributes (a=append only, i=immutable, s=secure deletion,..)
lsattr file       # view file attributes
chattr +a file    # allow only appends to file (requires root to (un)set)
</code>

==== SUID, SGID, Sticky Bit ====

=== suid,sgid on files ===
When users execute executables with those bits, the effective suid/sgid of the process is the owner of file (This is why regular useres can edit ''/etc/shadow'' with ''/bin/passwd'')

=== sgid on directory=== 
All files will get the same group as directory, and all subdirectories will inherit the sgid bit.

===sticky bit===
In sticky directories only owners of a file/directory and root can delete it (as in /tmp)

see also: http://www.bashguru.com/2010/03/unixlinux-advanced-file-permissions.html

== umask ==

The [[http://en.wikipedia.org/wiki/Umask|umask]] determines which file permissions are set for files and directories when they are created. Note, that the umask definition is negative, i.e. will be subtracted from the default permission (e.g. 666 for files). A typical umask is 022: 666 - 022 = 644, which means the owner can read and write to a file, group and other can only read.
 
''umask'' is a bash built-in to set & view the current umask of a user. Typically it is invoked in /etc/profile (or a file referenced from there).

<code bash>
umask              # print currently active umask (octal)
umask -S           # print currently active umask (symbolic output)
umask 0xxx         # set a umask (octal)
</code>
===== Search ======

Two important tools can be used for searching files in Linux: ''find'' and ''locate''.
The big difference between those two: ''find'' searches the system 'live', ''locate'' uses a database that is typically updated only once a day.

== find ==

In the simplest invocation ''find'' just prints all files in the current directory and all its subdirectories (similar to ''tree'').
<code bash>
find                        # print all filenames in dir + subdirs
find dir1 dir2              # print many dirs + subdirs
</code>

To filter the files an "expression" must be added as last parameter. An expression can be quite complex and can consist of options, tests and actions. The most straight-forward test is ''-name'':

<code bash>
find dir1 dir2 -name "*.jpg"      # find all .jpg files in both directories
find dir1 dir2 -iname "*.jpg"     # same, but ignore case (e.g. also find .JPG)
</code>

Some examples for **options**
<code bash>
find dir -maxdepth 1              # find all files in dir (but not its subdirs)
find dir -xdev -name "*.jpg"      # find files only in the current file system
                                  # (i.e. does not search NFS shares or /proc)
</code>

Some examples for **tests**:
<code bash>
find dir -size +10M               # find files bigger than 10 megabytes
find dir -size -50c               # find files smaller than 50 bytes (c=byte, k=kilobyte, M=Megabyt, G=Gigabyte)
find /bin -perm -u=s              # find all executables in /bin that have the suid bit set for its owner
find dir -perm 644                # find files with permissions being exactly 0644
find dir -atime -3                # find files accessed in the last two days
find dir -cmin +10                # find files created more than 10 minutes ago (for modified files: -mmin/-mtime)
find /usr/bin -executable -type f # find all executable files in /usr/bin
</code>

Some examples for **actions**: delete files, execute commands
<code bash>
find dir -exec command {} \;      # execute a command for each of the found file - {} will be replaced with the filename
find dir -exec command {} +       # execute one command for all of the found files - {} will be replaced with the filenames
find dir -execdir command {} \;   # same as -exec, but the execution directory is the directory of the file, not the starting directory
find dir -delete                  # delete found files
</code>

http://linux.101hacks.com/linux-commands/find-command-examples/

== locate ==

locate & slocate..


== grep ==

For searching contents of files ''grep'' can be used:

<code bash>
# find all files in or below the current directory that contain "searchString"
grep -r searchString .
</code>

===== Split & Merge ======

''split'' can split one file into several files. Splitting can be done by lines (-l N) or into a nr of equal chunks (-n N). By default files are named x__: x plus two suffixes iterating through the alphabet, from xaa to xzz. The resulting names can changed: The prefix is an optinal argument after the file name, the nr of suffixes is set with -a.

<code bash>
split file.txt                      # split file.txt in chunks of 1000 lines and name them xaa, xab,..
cat xa* > merged.txt                # merge split file again
split -a 3 -l 1 big.csv splitfile   # create files with the names 'splitfile***', containing one line each
split -b 700M archive.tar.gz        # split an archive into 700MB pieces
</code>

''csplit'' splits files according to a context (regex), hence the c. 

<code bash>
csplit file.txt "/regex/"           # split file at first occurrence of regular expression
csplit file.txt "/regex/" "{*}"     # split file at all occurrences regular expression
</code>

Merging files is handled by ''cat'' (concatenate)
<code bash>
cat file1 file2 file3 > mergedfile  
</code>
===== Zip =====

==== Uncompressed archives ====
<code bash>
tar -cf archive.tar a b c            # create tar archive of file/folders a, b, c
tar -tf archive.tar                  # list contents of archive (actually: tests integrity of archive)
tar -xf archive.tar  -C destination  # extract content of tar archive into folder destination (otherwise: current directory)
</code>
''tar'' creates uncompressed archives by default.

==== Compressing files ====
For compressing files the zip formats gzip (.gz), bzip2 (.bz2) and xz (.xz, which uses LZMA compression) are commonly used. 

Gzip is the most commonly used one, but according to [[https://www.rootusers.com/gzip-vs-bzip2-vs-xz-performance-comparison|this in-depth comparison]] .xz is the superior format - it is fast to decompress, achieves high compression rates and also has reasonable compression times until level 2 or 3.

The following commands (un)compress a single file and then remove the old (un)compressed file.
<code bash>
gzip my.txt                          # creates the gzipped file my.txt.gz and remove my.txt
gunzip my.txt.gz                     # recreates my.txt and removes my.txt.gz; same as gzip -d
</code>
The commands ''bzip2'' + ''bunzip2'' and ''xz'' + ''unxz'' work in the same fashion.

For all zip formats there is a ''cat''-style alias which allows for the creation of an uncompressed data stream:
<code bash>
zcat                 # stream uncompressed gzip file to stdout, same as gzip -cd
bzcat
xzcat                # same as xz --decompress --stdout
</code>


==== Compressed Archives ====
Often you want to compress archives as e.g. tar.gz files. This can either be done with separate calls to ''tar'' and ''gzip'' or directly with ''tar'':

<code bash>
tar -czf archive.tar.gz a b c                  # create gzipped archive
tar -cf archive.tar a b c && gzip archive.tar  # does the same
tar -xzf archive.tar.gz                        # extract gzipped archive
gunzip archive.tar.gz && tar -xf archive.tar   # does the same
</code>
For taring or untaring zipped archives use the following flags: gzip (-z), bzip2 (-j), or xz (-J).

==== Working with archives ====

<code bash>
tar -tf my.tar           # print list of contents
</code>

<code bash>
cpio
</code>
''cpio'' can copy files to and from archives FIXME


====== System Information ======

System information can be gathered from many places. Apart from various programs the two virtual filesystems ''procfs'' and ''sysfs'' are quite important. They allow us to peek into the kernel by providing ''/proc'' and ''/sys''. There information about processes and other system information is presented in a hierarchical file-like structure - meaninging we can use it like every other file in Linux with our favorite tools.

Get help for ''procfs'' with ''man proc'' or on [[http://en.wikipedia.org/wiki/Procfs|Wikipedia]].

''Sysfs'' is a successor of procfs and exports kernel data structures, their attributes, and the linkages between them to userspace. Documentation is [[https://www.kernel.org/doc/Documentation/ABI/testing|here]].

===== Hardware Information =====

With ''procfs'' we can get info about the CPU, Interrupts (IRQ) and IO ports of devices can be viewed via ''procfs'' - the latter two only if the kernel module for devices is loaded.
<code bash>
cat /proc/cpuinfo    # CPU
cat /proc/scsi/scsi  # available SCSI devices
cat /proc/interrupts # IRQs
cat /proc/ioports    # IO ports
</code>

These programs also find devices without working or loaded kernel module by live probing the hardware. If the standard output of these programs is not informative enough, try the verbose flag (-v or even -vv).
<code bash>
lspci                # list PCI devices - live information about PCI buses in the system and devices connected to them.
lsusb                # list USB devices
lshw                 # extract detailed information on the hardware configuration
                     # (basically every possible piece of hardware)
</code>

Advanced ''lshw''
<code bash>
lshw -html > /tmp/html # generate a nice html to view it in a browser
lshw -short            # quick overview
lshw -class network    # filter by class (see -short for classes)
</code>

The tool ''get-edid'' (in the package read-edid) can help to identify monitors (which are not included in lshw or lspci):
<code bash>
sudo apt-get install read-edid
sudo get-edid
</code>


GUI applications available include ''kinfocenter'' or ''hardinfo''.

==== Memory ====

<code bash>
free -m              # quick overview of total, free and used memory (and swap) in megabytes
cat /proc/meminfo    # in-depth look at memory stats via procfs
</code>

==== Sensors ====
Install the package ''lm-sensors'' to view sensor values like temperature or fan rpm:
<code bash>
sensors-detect       # set up which sensors to use
sensors              # view all sensore values
</code>

===== OS / distribution version =====

<code bash>
uname -a             # print kernel version, platform (i.e. 32/64 bit),..
lsb_release -a       # print distribution information, e.g. release name and version
cat /etc/os-release  # text file containing about the same information as in lsb_release
cat /etc/issue       # text file containing login greeting (typically the release name)
</code>

===== Kernel =====

Get kernel-related information:

<code bash>
cat /var/log/dmesg   # kernel messages (also the ones during boot)
dmesg                # print all kernel messages in kernel ring buffer (look for error messages there; and local printers)
cat /proc/cmdline    # kernel boot time arguments
lsmod                # show the status of modules in the Linux Kernel
cat /proc/modules    # loaded kernel modules (see also: lsmod, /etc/modules)
modinfo              # get information about kernel modules (e.g. parameters, version)
cat /lib/modules/<kernelversion>/modules.dep   # available kernel modules
lspci -v             # show hardware and which kernel driver they use
</code>

Programs & config files to load or unload kernel modules:

<code bash>
modprobe             # program to add and remove modules from the Linux Kernel (handles dependencies)
insmod               # simple program to insert a module into the Linux Kernel
rmmod                # simple program to remove a module from the Linux Kernel
depmod               # program to generate modules.dep
cat /etc/modules.conf   # config file specifying modules loaded at startup
</code>

Note: in Ubuntu 12.04 the file ''/etc/modules.conf'' seems to be replaced by the combination of ''/etc/modules'' and ''/etc/modprobe.d/*''.

Other system information:
<code bash>
cat /var/log/messages   # system log - after start of the logging daemon (since Ubuntu 11.04: /var/log/syslog)
cat /var/log/boot.log   # system log - also before start of the logging daemon (written by dmesg after boot?)
</code>

===== Filesystem information =====

<code bash>
cat /proc/filesystems  # supported filesystems (currently loaded kernel modules)
cat /proc/swaps      # current swap partitions
cat /etc/mtab        # currently mounted partitions (the same output as when calling mount without parameters)
cat /etc/fstab       # config file for mountpoints
</code>

===== Users =====

These programs give (slightly different) information about the currently logged in users (and more)
<code bash>
w                    # currently logged in users & what they are doing (+uptime as first line)
who                  # currently logged in users
</code>

===== Uptime & boot date =====

<code bash>
who -b               # date+time of last system boot
uptime               # current time, how long the system has been running, how many users are currently logged on
                     # and the system load averages for the past 1, 5, and 15 minutes
cat /proc/uptime     # uptime of the system (seconds), and the amount of time spent in idle process (seconds)
</code>

===== Date & Time =====

''date'' is a versatile tool for formatting dates / times and setting the system time ([[linux:installation#date_timezone|see also]]).

Print the current time in different formats
<code bash>
date                      # default - a human readable format
date -I                   # only the date in ISO 8601
date -Iseconds            # date and time in ISO 8601
date +%Y-%m-%dT%H%M       # custom format useful for scripts, e.g. 2019-08-14T1129
</code>

Convert unix timestamps
<code bash>
date -d @1278923870       # print the unix timestamp in a human readable format
</code>


====== Working with Text ======

<code bash>
more file.txt                       # page through contents of file
less file.txt                       # page through contents of file more conveniently
</code>

<code bash>
paste file1 file2                   # view lines of two files next to each other, separated by a tab
paste -d ';' a.csv b.csv            # merge the columns of two .csv files line by line
</code>
''paste'' allows viewing files next to each other i.e. veritcally concatenate them.  This is useful to e.g. merge .csv files or as a simple drop-in for ''diff''. The delimiter can be changed with -d.

<code bash>
nl file.txt                         # view content of file with line numbers
nl -i 5 -s '|--|' sort.txt          # increment line number by 5 and display the string |--| between line number and line 
</code>
''[[http://www.thegeekstuff.com/2013/02/wc-nl-examples|nl]]'' numbers lines. The width of the line-number can be adjusted (-w N).

<code bash>
fmt lorem.txt                       # reformat text to 75 characters per line
</code>
''fmt'' does the same as dynamic word wrap in modern editors. Paragraphs are reformatted to a character width (-w N), optionally with a different indentation for the first line per paragraph (-t).

<code bash>
rev lorem.txt                       # reverse each line and print it to stdout
</code>
''rev'' reverses lines of text

<code bash>
pr lorem.txt                        # create pages with 66 lines including a header (in one text file9
pr -2 -l 50 lorem.txt               # 2-column layout, only 50 lines per page
</code>
''pr'' converts text files for printing and should be combined with ''fmt'' because it takes lines as-is.

<code bash>
tee                                 # write stdin to stdout
echo 'hello' | tee file.txt         # writes 'hello' to stdout and into file.txt
</code>
''tee'' writes it input to stdout and as many files as desired. Files can be appended (-a).

<code bash>
wc -l file1.txt file2.txt           # show word count for both files
</code>
''wc'' aka 'word count' counts lines (-l), word (-w), (UTF-8) characters (-m), bytes (-c). 

===== Filtering & Selecting =====

== tr ==

''[[http://www.thegeekstuff.com/2012/12/linux-tr-command|tr]]'', aka translate characters, is a good supplement to ''sed'' when it comes to special characters. With ''tr'' newlines, spaces, and non-printing characters can easily be replaced, deleted or squeezed. Important arguments are delete (-d), squeeze (-s) and complement (-c).

<code bash>
tr -d '[:space:]' < lorem.txt    # delete all white-space from lorem.txt and print it to stdout
tr -s '\n' ' ' < file.txt        # join all lines of a file into a single line
tr '[A-Za-z]' '[N-ZA-Mn-za-m]'   # rot13 
</code>

== cat & tac == 

''cat'' and ''tac'' concatenate files (opposite of ''split''). ''cat'' is also often used to simply print a file to stdout to view its content or pipe it into other programs.

<code bash>
cat file1 file2    #print file1 and then file2 to stdout
cat -A file1       #show non-printing characters, tabs, line ends. ugly results for non-ASCII characters.
tac file1 file1    #print file1 and then file2 to stdout, but reverse the line order for each file
</code>

== sort ==

''sort'' can be used to sort lines in one (or more) files. The ordering can be alphanumeric (default), numeric (-n), human-readabe numeric e.g. 2K 1M (-h),... The input can also be sorted reverse (-r) or randomized (-R).

<code bash>
sort file1.txt file2.txt     #print sorted lines from all input files
sort -k 1.3 file.txt         #sort file by the line-content starting with the third character
sort -t ';' -k 3 file.csv    #print lines of csv, sorted by third the column
sort -u file.txt             #sort file and only print unique lines (see also: ''uniq'')
</code>

== shuf ==

Shuffle - the opposite of ''sort''.
People on StackOverflow claim it's faster than ''%%sort --random-sort%%''.

== uniq ==

With ''uniq'' you can filter (omit) or report adjacent repeated lines.

<code bash>
uniq file.txt                 #filter adjacent repeated lines
uniq -i -u file.txt           #filter while ignoring case and only print unique lines
uniq -c -d file.txt           #filter, and only print duplicates and their count
uniq -s 2 -w 4 file.txt       #filter, but only compare 4 chars not including the first 2 for each line
</code>

== head & tail ==

''head'' and ''tail'' allow you to print the first / last lines or bytes of a file.

<code bash>
head file.txt             # print first 10 lines
head -n 5 file.txt        # print first 5 lines
head -c 10 file.txt       # print first 10 bytes (no word-option)
tail -n 10 file.txt       # print last 10 lines
tail -n +2 file.txt       # print everything from line 2 on (everything except first line)
</code>

''tail'' can also be used to "follow" a file, i.e. print changes to the screen in real time:
<code bash>
tail -f file.txt    # print last 10 lines and all lines that are appended to the file in the future
tail -F file.txt    # same as --retry -f (realizes, when the file to be tailed is deleted and created again)
</code>


==== Regular Expressions ====

Regular expressions are a very powerful tool to match strings of text. It comes in various flavours, the one of concern here are POSIX.2 regexes, which are documented in ''man 7 regex''.
A great resource that also lists the slight differences between regular expressions in different programming languages,.. is [[http://www.rexegg.com|Rex Egg]]

=== Tools making heavy use of regular expressions ===

== awk ==

''awk'' .. [[http://www.pement.org/awk/awk1line.txt|1liners]]

== sed ==

''sed'' aka 'stream editor' is commonly used to **replace strings** in lines of text files. See [[http://sed.sourceforge.net/sed1line.txt|this list of one-liners]] for more inspiration.

<code bash>
sed
sed s/abc/xxx/ file.txt    #simple use-case: replace the first 'abc' in each line of file.txt with 'xxx' and print it to stdout
sed s/abc/xxx/g file.txt   #global replace: replace all 'abc' per line
sed -i -e 's/a c/x x/g' -e 's/a/b/' file.txt    # replace in-line (overwrite file.txt) with -i, and several expression with -e
</code>

Two good tricks to create readable regexes are to (1) escape the whole regex with single quotes and (2) use a different separator than ''/'' when appropriate. See for yourself in this example of replacing all backslashes in a line with slashes.

<code bash>
sed s/\\\\/\\//g count.txt   #wtf
sed s#\\\\#/#g count.txt     #use any character as separator-char
sed 's#\\#/#g' count.txt     #avoid escaping for the shell. we still must escape the backslash for the regex.
</code>

Advanced usage examples:
<code bash>
sed -n 1~2p file.txt         #only print odd lines (=print line 1 and then print each line at step 2)
</code>

By default ''sed'' regular expressions are limited, e.g. do not support matching groups. Use the switch ''-E'' to activate extended regular expressions and enjoy matching groups:

<code bash>
sed -E 's#(.*),(.*)#\1;\2#g' #replace a single comma with a semicolon
</code>

However, even with extended expressions ''sed'' does not support non-greedy expressions as it is possible with PCRE (perl compatible regular expressions). A good resort is to simply use ''perl'' itself:

<code bash>
perl -pe 's/.*thevalue="(.*?)".*/\1/g' file.txt
</code>

== grep ==

''grep'' is used to **print selected lines** of a file, that match the regex.
<code bash>
grep regex file.txt              #print lines containing the string 'regex' in file.txt
grep '^[[:digit:]]' file.txt     #print lines starting with a digit
grep '[[:alpha:]]$' file.txt     #print lines ending with an alphanumeric character
grep '[1-3]a' file.txt           #print lines containing 1a, 2a or 3a
</code>

Flavours of grep:
<code bash>
rgrep         #same as grep -r: recursively execute grep for each file under a directory
fgrep         #same as grep -F: search for fixed strings, do not interpret the search string as regex
egrep         #same as grep -E: use extended regular expressions
egrep '([1-3]a){2}' file.txt   #print lines containing 1a, 2a or 3b exactly twice after each other
egrep 'one|two' file.txt       #print lines containing either 'one' or 'two' (using extended regular expressions)
</code>

=== Regular expression tipps ===

== Capture groups ==

If you want to insert text with regular expressions, [[http://www.rexegg.com/regex-capture.html|groups]] come in handy. A group is defined by parenthesis ''()'' in the find-expression. All that is matched by the expression within the parenthesis can be added to the replace-expression with ''${n}'' or ''\{n}'', where n is the group number. For example:

<code bash>
rename 's/(^\d*)/${1}_insert/' 1234_abc.txt     # renames file to: 1234_insert_abc.txt
</code>

===== CVS handling à la Database =====

<code bash>
expand file.txt           #replace each tab with (up to) 8 spaces
expand -t 4 file.txt      #replace each tab with (up to) 4 spaces
expand -t 8,16 file.txt   #first replaced tab has last space at position 8, second tab at 16. Other tabs are replaced with a single space.
</code>
''expand'' replaces tabs with spaces. ''unexpand'' does the opposite.

<code bash>
cut
cut -c 3-5 file.txt       #print characters 3 to 5 of each line 
cut -d ';' -f 1,3 csv.csv #print columns 1 and 3 of a CSV
</code>
''[[http://how-to.linuxcareer.com/learning-linux-commands-cut|cut]]'' can be used to cut bytes (-b), characters (-c) or fields of a tab-separated file (-f). Currently cut is buggy in Ubuntu 12.04, -c is treated as -b and so the tool breaks for UTF-8 characters.

<code bash>
join a.csv b.csv                   #join two files on first field
join -t ';' -j 2 a.csv b.csv       #join two semicolon-separated files on the second field
join -t ';' -1 1 -2 4 a.csv b.csv  #join first field of a.csv on fourth field of b.csv
join --header a.csv b.csv          #joins and treats first line as header
</code>
''[[http://how-to.linuxcareer.com/learning-linux-commands-join|join]]'' is used to join e.g. two CSVs on a common column, which must be sorted. The default field separator is blanks, i.e. spaces or tabs.
By default the result for a non-successful join (no equal fields) is empty, because unpairable lines are omitted. These can be printed with -a.


===== Internationalization =====

Text can come in various encodings. ''recode'', ''iconv'' and ''dox2unix'' convert file contents from one encoding to another, ''convmv'' converts filenames from one encoding to another.

Common encodings encountered in Austria are:
  * ASCII: 7 bit (128 characters) and base for all other encodings in the list
  * ISO-8859-1 (latin1): old western european ASCII extension
  * ISO-8859-15 (latin9): slightly updated version of latin1 (e.g. with €)
  * CP-1252: code page used by German versions of Windows (superset of ISO-8859-1)
  * UTF-8: Unicode

== recode - in-place recoding & filtering ==

''recode'' can operate in two modes:
  * as in-place-recoding tool
  * as filter

The 'request' typically looks like ''from..to'', where from and to consist of charset/surface. Charsets are e.g. UTF-8 or ISO-8859-1 but also ''HTML'',  ''JAVA''. Common surfaces for line-ends are carriage returns ''/CR'' (Unix) or carriage return and line feed ''/CL'' (Windows), but you can also convert e.g. from/to base 64 (''/b64''), hexadecimal (''/x1''), decimal (''/d1'') or quoted printable (''/QP'').

When from or to or the surface are ignored, the default is used. The default charset depends on the system locale, the surface on the charset.

Some in-place examples:
<code bash>
recode HTML file.txt                # from html to default
recode ..HTML file.txt              # from default to hml
recode UTF-8..HTML file.txt         # from utf8 to hml
</code>

In-place line-end switching:
<code bash>
recode ../CR file.txt               # convert to Unix line endings
dos2unix file.txt                   # same
recode ../CL file.txt               # convert to Windows line endings
unix2dos file.txt                   # same
</code>

Filter example:
<code bash>
cat file.txt | recode /b64 > newfile.txt   # convert base64 encoded file to default encoding
</code>

== iconv - stream-based recoding ==

<code bash>
iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt  # stream-based recoding of file contents to UTF-8
</code>

== convmv - recoding of file names ==

By default this command only prints what it would do, use ''%%--notest%%'' when the results are as expected.

<code bash>
convmv -f ISO-8859-1 -t UTF-8 --notest file
</code>


====== Power Management ======

Suspend / Hibernate with the package ''pm-utils''
<code bash>
pm-suspend           # suspend to RAM
pm-hibernate         # suspend to disk
</code>

Turn off screen
<code bash>
xset dpms force off
</code>