This page covers basic usage of / working with Linux. A great resource are the man pages but also the [[https://www.gnu.org/software/coreutils/manual/html_node/index.html|official documentation of the GNU Coreutils]] ====== Getting Help ====== == type == ''type'' is a builtin command of bash, that shows the type of a command, e.g. binary, script, alias, builtin command,.. Depending on the result you must use the respective help system. type ls # show type of ls type -a kill # show all commands with the name kill (=one program & one builtin!) == help == ''help'' is a bash builtin that gives you help for bash builtins (try ''help help''). help # list all builtin bash commands help type # get help for builtin command 'type' == man, whatis, apropos == ''[[http://www.mcmcse.com/linux/man_pages.shtml|man]]'' shows you the man(ual) page for nearly every Linux program but also config files or system calls. Man pages are divided in sections, the most important ones are programs (1), config files (5) and sysadmin tools (8). Man pages are preprocessed with the program ''groff'' and displayed using the program ''less''. Use the arrow keys to navigate, type ''/searchphrase'' and enter to search, and type ''q'' to quit. man man # view man page of man itself man passwd # view man page of programm passwd (section 1) man 5 passwd # view man page of /etc/passwd (section 5) ''whatis'' shows shows man pages whose names contain your search phrase. whatis man -f man -aw #show full path to man file instead of name only ''apropos'' shows man pages relevant to a topic by searching through both the "Name" and "Description" sections of the man page database. apropos man -k == info == ''info'' shows info page for many GNU programs. Sometimes ''info'' gives more in-depth information than ''man'' (e.g. info ls). Type ''h'' in ''info'' to get a help on how to navigate with the keyboard. info == whereis, which == ''whereis'' shows the paths of a program, its config files and man pages. ''which'' only shows the full path of a program whereis ls which ls ====== Bash ====== ''bash'' is a commonly used shell a.k.a. command-line interpreter a.k.a. terminal. It is used to run programs, scripts, and basically interact with the system. [[linux:bash|Here configuration, basic usage and bash scripting is explained.]] ====== Files and Directories ====== ===== Exploring the File System ===== == cd - change directory == cd /etc/init.d # change into directory /etc/init.d cd or cd ~ # change into home directory cd .. # change into parent directory cd - # change into last directory you visited before Note, that this is a shell builtin, not a program. == ls - list directory contents == ## list files, check free / used disk space ls [] # list files (in dir) ls -lah [] # list *all* files in *long format* (permissions, owner, size in *human readable format*,..) ls -ld [] # info for directory itself By default ''ls'' sorts files by name in ascending order, this can of course be changed using either short or long forms: ls -lr # reverse order ls -lS # ls -l --sort=size (biggest first) ls -lX # ls -l --sort=extension (ascending) ls -lt # ls -l --sort=time (modification time, newest first) ls -lut # ls -l --sort=time --time=atime (last access time, newest first) ls -lct # ls -l --sort=time --time=ctime (creation time, newest first) ''ls'' can also be used to search in the current directory. You can of course make use of bash wildcards here. ls *.txt # show all text files in current directory ls -d *dir* # show all directories in current directory containing 'dir' === Disc Space === == du - disc usage == ''du'' always calculates the size of a directory recursively. The options limit verbosity of how many lines are printed, but the totals do not change. du # recursively print disc usage of all directories below the current directory (+ the current directory) du -sh # human readable summary of total disk usage, or current directory if no file is given du -c # grand total is displayed in addition to usage for two directories du -h -d # total disk usage dir and subdirs (show usage for each subdir as well) du --exclude=*.mp3 # disk usage except for mp3s du -a # also print usage for files du -s # only print a summary-line (disc usage of current directory) == df - disc free == ''df'' is (kind of) the opposite of ''du'', it shows remaining size (or inodes) on a partition. df # show free space on all mounted filesystems df -h # use human-readable format df -i # show used / free nr of inodes df -h # show free space on the file system where file lies == GUI approaches == Great graphical insights into disc usage are given by programs that analyze a whole directory tree and let you easily identify files or directories that take up the most space: ''ncdu'' is a text-based (ncurses) program, ''filelight'' a KDE-based GUI program and ''baobab'' aka ''Disk Usage Analyzer'' is the GNOME equivalent. ===== File Handling ===== == file - determine file type == This tool determines the file type (and encoding for text files) by analyzing its contents. file myfile # print file type file -i myfile # print MIME type file -bi myfile # print MIME type (without prepending file name) == touch, mkdir - create == touch file.txt # creates an empty file using touch > file.txt # creates an empty file using bash redirection mkdir directory # creates an empty directory mkdir -p parent/subdir # create the whole directory tree (p = parent) Actually, ''touch'' was written to change times (by default: atime and mtime - access & modification time) of a file. By default it also creates files if they do not exist yet, the parameter ''-c'' suppresses file creation. touch file.txt # set atime and mtime of file.txt to current time touch --date="2012-05-01 18:12" file.txt # set atime and mtime of file.txt touch --time=atime --date="2012-05-01" file.txt # only set atime of file.txt == cp - copy == cp srcfile targetfile # copy files (dereferences symlinks automatically!) cp -r srcdir targetdir # resursively copy directories (dereferences symlinks automatically!) cp -a or cp -Rpd # archive (backup) files: recursive, preserve symlinks, owner, timestamps, mode Some important parameters: * ''-L'': always dereference symlinks * ''-P'': do not dereference symlinks * ''-d'': do not dereference and preserve symlinks * ''-p'': preserve owner, timestamps, mode (permissions) * ''-R'' or ''-r'': recursive copy * ''-u'': update - copy only if source file is newer than destination or missing == mv - move (or rename) == The program ''mv'' moves files or directories from one location to another. If both locations are on the same partition, it is a simple rename, otherwise the content is copied and deleted. mv src target # move a file or directory. When moving a file to an existing target that is also a file the target is overwritten without any notice. If the existing target is a directory, the file or directory to be moved is moved into the directory. It is therefore advisable to be careful when using ''mv'' with files. mv -i srcfile targetfile # interactive mode (asks if a file should be overwritten) mv -n srcfile targetfile # noclobber mode (does not overwrite files - but also does not print a message) A practical tool for bulk renaming is the perl program ''rename''. There you can use the full power of perl regular expressions. [[http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal|More examples.]] rename -n s/old/new/ * # dry-run of what would happen when renaming all files and directories in the current directory rename -v s/old/new/ * # actually rename and print all changes rename 's/\.htm$/\.html/' *.htm # complex example: rename all .htm files to .html == ln - link == To make it easier to remember: as with ''cp'', the first argument is the part that already exists. ln src target or cp -l src target # create a hardlink (both file reference the same inode) ln -s src target or cp -s src target # create a symbolic link (symlink) == rm, rmdir - remove == rm file.txt # remove a file rm -r directory # recursively remove a directory rmdir directory # remove an empty directory rmdir -p parent/sub # remove an empty directory tree ==== Secure File Handling ==== == scp: secure copy == ''scp'' can copy files between hosts (also between two remote hosts) Recursively copy ''localDir'' to a remote host: scp -r localDir user@host:remoteDir To copy between two remote hosts either the source host must have credentials to log into the target host or you use the issuing host as third party via ''-3'' (recommended). scp -3 -r user1@host1:remoteDir2 user2@host2:remoteDir2 == rsync: copy / synchronize directories locally and remotely == ''rsync'' can keep two directories synchronized intelligently by only syncing changes, which makes it useful for e.g. remote backups. Simple local usage, where ''localDir'' and all its contents will be copied to ''otherLocalDir/localDir'': rsync -a localDir otherLocalDir When copying remotely by default ''ssh'' is used. Quick setup: # @destination (where files are copied to) # set up ssh server (listens for incoming connections) sudo apt-get install openssh-server # @source ssh-keygen ssh-copy-id user@remotehost rsync -av localdirectory user@remotehost:directory [[https://help.ubuntu.com/community/rsync|More on]] [[https://wiki.ubuntuusers.de/rsync|this topic]] ===== File Permissions and Attributes ===== Each file in Linux has * a type * an owner * a group (similar to owner) * permissions (can owner/group/others read/write/execute?) * attributes Note, that for ''root'' can not only change file permissions but they are also do not apply (except for the execution of files)! E.g. ''root'' can can always write/delete write-protected files or create files in in a write-protected directory. To protect files / directories from unwanted changes by ''root'' attributes can be used. The output of ''ls -l'' e.g. looks like ''drwxrwxrwx''. The first character identifies the file type, then there follow three access-tuples (for owner, group, others). List of possible values for the file type (first character): * - file * d directory * l symlink * c character device (e.g. /dev/random) * b block device (e.g. /dev/sda1) * p fifo * s socket == chown - change owner == With ''chown'' both the owner and the group can be changed. Simple usage examples: chown myuser file.txt # change the owner of file.txt to 'myuser' chown myuser:mygroup file.txt # change the owner of file.txt to 'myuser' and the group to 'mygroup' chown :mygroup file.txt # only change the owner of file.txt to 'mygroup' chown -R myuser directory # change the owner of directory recursively to 'myuser' If a symbolic link is given, by default the referenced file is changed, not the link itself. Use ''-h'' to avoid dereferencing and to change the owner of the symlink itself. In recursive mode symbolic links not traversed by default (''-P''). With ''-L'' every encountered symlink can be traversed. (FIXME ausprobieren) To only change the owner for files owned by a certain user and/or group use the ''from'' option with the ''user:group'' syntax: chown -R --from=myuser:mygroup otheruser:othergroup directory [[http://www.thegeekstuff.com/2012/06/chown-examples|More examples]] == chgrp - change group == FIXME == chmod - change permission == A combination of the letters **ugoa** controls which users' access to the file will be changed: * u the user who owns it * g other users in the file's group * o other users not in the file's group * a all users The letters **rwxXst** select file mode bits for the affected users: * r read * w write * x execute (or search for directories) * X execute/search only if the file is a directory or already has execute permission for some user * x set user or group ID on execution * t restricted deletion flag or sticky bit The letter for affected users and the ones for the mode can be combined with one of ''=+-''. (equals, add, remove) chmod o+rw file # add permissions for other (everybody) to read and write file Explicit setting of all permissions is also possible in four octal numbers (e.g. 0644 or 0755 are common). The first represents the suid/sgid/sticky bits, the other three permissions for ugo. * number 1: read=4, write=2, execute=1 * numbers 2-4: suid=4, sgid=2, sticky=1 chmod 0644 file # do not set suid/sgid/sticky bit but allow owner rwx and group/other rw Some advanced examples: chmod u+srw,g+r file or chmod 4740 file # set suid, read and write access for user, # only read acces for group, nothing for other chmod -R a+awX # recursively give read/write access to a directory tree # (executable bit is only set for directories - and files where # one of ugo already has the executable bit set) == chattr - change attributes == chattr - change file attributes (a=append only, i=immutable, s=secure deletion,..) lsattr file # view file attributes chattr +a file # allow only appends to file (requires root to (un)set) ==== SUID, SGID, Sticky Bit ==== === suid,sgid on files === When users execute executables with those bits, the effective suid/sgid of the process is the owner of file (This is why regular useres can edit ''/etc/shadow'' with ''/bin/passwd'') === sgid on directory=== All files will get the same group as directory, and all subdirectories will inherit the sgid bit. ===sticky bit=== In sticky directories only owners of a file/directory and root can delete it (as in /tmp) see also: http://www.bashguru.com/2010/03/unixlinux-advanced-file-permissions.html == umask == The [[http://en.wikipedia.org/wiki/Umask|umask]] determines which file permissions are set for files and directories when they are created. Note, that the umask definition is negative, i.e. will be subtracted from the default permission (e.g. 666 for files). A typical umask is 022: 666 - 022 = 644, which means the owner can read and write to a file, group and other can only read. ''umask'' is a bash built-in to set & view the current umask of a user. Typically it is invoked in /etc/profile (or a file referenced from there). umask # print currently active umask (octal) umask -S # print currently active umask (symbolic output) umask 0xxx # set a umask (octal) ===== Search ====== Two important tools can be used for searching files in Linux: ''find'' and ''locate''. The big difference between those two: ''find'' searches the system 'live', ''locate'' uses a database that is typically updated only once a day. == find == In the simplest invocation ''find'' just prints all files in the current directory and all its subdirectories (similar to ''tree''). find # print all filenames in dir + subdirs find dir1 dir2 # print many dirs + subdirs To filter the files an "expression" must be added as last parameter. An expression can be quite complex and can consist of options, tests and actions. The most straight-forward test is ''-name'': find dir1 dir2 -name "*.jpg" # find all .jpg files in both directories find dir1 dir2 -iname "*.jpg" # same, but ignore case (e.g. also find .JPG) Some examples for **options** find dir -maxdepth 1 # find all files in dir (but not its subdirs) find dir -xdev -name "*.jpg" # find files only in the current file system # (i.e. does not search NFS shares or /proc) Some examples for **tests**: find dir -size +10M # find files bigger than 10 megabytes find dir -size -50c # find files smaller than 50 bytes (c=byte, k=kilobyte, M=Megabyt, G=Gigabyte) find /bin -perm -u=s # find all executables in /bin that have the suid bit set for its owner find dir -perm 644 # find files with permissions being exactly 0644 find dir -atime -3 # find files accessed in the last two days find dir -cmin +10 # find files created more than 10 minutes ago (for modified files: -mmin/-mtime) find /usr/bin -executable -type f # find all executable files in /usr/bin Some examples for **actions**: delete files, execute commands find dir -exec command {} \; # execute a command for each of the found file - {} will be replaced with the filename find dir -exec command {} + # execute one command for all of the found files - {} will be replaced with the filenames find dir -execdir command {} \; # same as -exec, but the execution directory is the directory of the file, not the starting directory find dir -delete # delete found files http://linux.101hacks.com/linux-commands/find-command-examples/ == locate == locate & slocate.. == grep == For searching contents of files ''grep'' can be used: # find all files in or below the current directory that contain "searchString" grep -r searchString . ===== Split & Merge ====== ''split'' can split one file into several files. Splitting can be done by lines (-l N) or into a nr of equal chunks (-n N). By default files are named x__: x plus two suffixes iterating through the alphabet, from xaa to xzz. The resulting names can changed: The prefix is an optinal argument after the file name, the nr of suffixes is set with -a. split file.txt # split file.txt in chunks of 1000 lines and name them xaa, xab,.. cat xa* > merged.txt # merge split file again split -a 3 -l 1 big.csv splitfile # create files with the names 'splitfile***', containing one line each split -b 700M archive.tar.gz # split an archive into 700MB pieces ''csplit'' splits files according to a context (regex), hence the c. csplit file.txt "/regex/" # split file at first occurrence of regular expression csplit file.txt "/regex/" "{*}" # split file at all occurrences regular expression Merging files is handled by ''cat'' (concatenate) cat file1 file2 file3 > mergedfile ===== Zip ===== ==== Uncompressed archives ==== tar -cf archive.tar a b c # create tar archive of file/folders a, b, c tar -tf archive.tar # list contents of archive (actually: tests integrity of archive) tar -xf archive.tar -C destination # extract content of tar archive into folder destination (otherwise: current directory) ''tar'' creates uncompressed archives by default. ==== Compressing files ==== For compressing files the zip formats gzip (.gz), bzip2 (.bz2) and xz (.xz, which uses LZMA compression) are commonly used. Gzip is the most commonly used one, but according to [[https://www.rootusers.com/gzip-vs-bzip2-vs-xz-performance-comparison|this in-depth comparison]] .xz is the superior format - it is fast to decompress, achieves high compression rates and also has reasonable compression times until level 2 or 3. The following commands (un)compress a single file and then remove the old (un)compressed file. gzip my.txt # creates the gzipped file my.txt.gz and remove my.txt gunzip my.txt.gz # recreates my.txt and removes my.txt.gz; same as gzip -d The commands ''bzip2'' + ''bunzip2'' and ''xz'' + ''unxz'' work in the same fashion. For all zip formats there is a ''cat''-style alias which allows for the creation of an uncompressed data stream: zcat # stream uncompressed gzip file to stdout, same as gzip -cd bzcat xzcat # same as xz --decompress --stdout ==== Compressed Archives ==== Often you want to compress archives as e.g. tar.gz files. This can either be done with separate calls to ''tar'' and ''gzip'' or directly with ''tar'': tar -czf archive.tar.gz a b c # create gzipped archive tar -cf archive.tar a b c && gzip archive.tar # does the same tar -xzf archive.tar.gz # extract gzipped archive gunzip archive.tar.gz && tar -xf archive.tar # does the same For taring or untaring zipped archives use the following flags: gzip (-z), bzip2 (-j), or xz (-J). ==== Working with archives ==== tar -tf my.tar # print list of contents cpio ''cpio'' can copy files to and from archives FIXME ====== System Information ====== System information can be gathered from many places. Apart from various programs the two virtual filesystems ''procfs'' and ''sysfs'' are quite important. They allow us to peek into the kernel by providing ''/proc'' and ''/sys''. There information about processes and other system information is presented in a hierarchical file-like structure - meaninging we can use it like every other file in Linux with our favorite tools. Get help for ''procfs'' with ''man proc'' or on [[http://en.wikipedia.org/wiki/Procfs|Wikipedia]]. ''Sysfs'' is a successor of procfs and exports kernel data structures, their attributes, and the linkages between them to userspace. Documentation is [[https://www.kernel.org/doc/Documentation/ABI/testing|here]]. ===== Hardware Information ===== With ''procfs'' we can get info about the CPU, Interrupts (IRQ) and IO ports of devices can be viewed via ''procfs'' - the latter two only if the kernel module for devices is loaded. cat /proc/cpuinfo # CPU cat /proc/scsi/scsi # available SCSI devices cat /proc/interrupts # IRQs cat /proc/ioports # IO ports These programs also find devices without working or loaded kernel module by live probing the hardware. If the standard output of these programs is not informative enough, try the verbose flag (-v or even -vv). lspci # list PCI devices - live information about PCI buses in the system and devices connected to them. lsusb # list USB devices lshw # extract detailed information on the hardware configuration # (basically every possible piece of hardware) Advanced ''lshw'' lshw -html > /tmp/html # generate a nice html to view it in a browser lshw -short # quick overview lshw -class network # filter by class (see -short for classes) The tool ''get-edid'' (in the package read-edid) can help to identify monitors (which are not included in lshw or lspci): sudo apt-get install read-edid sudo get-edid GUI applications available include ''kinfocenter'' or ''hardinfo''. ==== Memory ==== free -m # quick overview of total, free and used memory (and swap) in megabytes cat /proc/meminfo # in-depth look at memory stats via procfs ==== Sensors ==== Install the package ''lm-sensors'' to view sensor values like temperature or fan rpm: sensors-detect # set up which sensors to use sensors # view all sensore values ===== OS / distribution version ===== uname -a # print kernel version, platform (i.e. 32/64 bit),.. lsb_release -a # print distribution information, e.g. release name and version cat /etc/os-release # text file containing about the same information as in lsb_release cat /etc/issue # text file containing login greeting (typically the release name) ===== Kernel ===== Get kernel-related information: cat /var/log/dmesg # kernel messages (also the ones during boot) dmesg # print all kernel messages in kernel ring buffer (look for error messages there; and local printers) cat /proc/cmdline # kernel boot time arguments lsmod # show the status of modules in the Linux Kernel cat /proc/modules # loaded kernel modules (see also: lsmod, /etc/modules) modinfo # get information about kernel modules (e.g. parameters, version) cat /lib/modules//modules.dep # available kernel modules lspci -v # show hardware and which kernel driver they use Programs & config files to load or unload kernel modules: modprobe # program to add and remove modules from the Linux Kernel (handles dependencies) insmod # simple program to insert a module into the Linux Kernel rmmod # simple program to remove a module from the Linux Kernel depmod # program to generate modules.dep cat /etc/modules.conf # config file specifying modules loaded at startup Note: in Ubuntu 12.04 the file ''/etc/modules.conf'' seems to be replaced by the combination of ''/etc/modules'' and ''/etc/modprobe.d/*''. Other system information: cat /var/log/messages # system log - after start of the logging daemon (since Ubuntu 11.04: /var/log/syslog) cat /var/log/boot.log # system log - also before start of the logging daemon (written by dmesg after boot?) ===== Filesystem information ===== cat /proc/filesystems # supported filesystems (currently loaded kernel modules) cat /proc/swaps # current swap partitions cat /etc/mtab # currently mounted partitions (the same output as when calling mount without parameters) cat /etc/fstab # config file for mountpoints ===== Users ===== These programs give (slightly different) information about the currently logged in users (and more) w # currently logged in users & what they are doing (+uptime as first line) who # currently logged in users ===== Uptime & boot date ===== who -b # date+time of last system boot uptime # current time, how long the system has been running, how many users are currently logged on # and the system load averages for the past 1, 5, and 15 minutes cat /proc/uptime # uptime of the system (seconds), and the amount of time spent in idle process (seconds) ===== Date & Time ===== ''date'' is a versatile tool for formatting dates / times and setting the system time ([[linux:installation#date_timezone|see also]]). Print the current time in different formats date # default - a human readable format date -I # only the date in ISO 8601 date -Iseconds # date and time in ISO 8601 date +%Y-%m-%dT%H%M # custom format useful for scripts, e.g. 2019-08-14T1129 Convert unix timestamps date -d @1278923870 # print the unix timestamp in a human readable format ====== Working with Text ====== more file.txt # page through contents of file less file.txt # page through contents of file more conveniently paste file1 file2 # view lines of two files next to each other, separated by a tab paste -d ';' a.csv b.csv # merge the columns of two .csv files line by line ''paste'' allows viewing files next to each other i.e. veritcally concatenate them. This is useful to e.g. merge .csv files or as a simple drop-in for ''diff''. The delimiter can be changed with -d. nl file.txt # view content of file with line numbers nl -i 5 -s '|--|' sort.txt # increment line number by 5 and display the string |--| between line number and line ''[[http://www.thegeekstuff.com/2013/02/wc-nl-examples|nl]]'' numbers lines. The width of the line-number can be adjusted (-w N). fmt lorem.txt # reformat text to 75 characters per line ''fmt'' does the same as dynamic word wrap in modern editors. Paragraphs are reformatted to a character width (-w N), optionally with a different indentation for the first line per paragraph (-t). rev lorem.txt # reverse each line and print it to stdout ''rev'' reverses lines of text pr lorem.txt # create pages with 66 lines including a header (in one text file9 pr -2 -l 50 lorem.txt # 2-column layout, only 50 lines per page ''pr'' converts text files for printing and should be combined with ''fmt'' because it takes lines as-is. tee # write stdin to stdout echo 'hello' | tee file.txt # writes 'hello' to stdout and into file.txt ''tee'' writes it input to stdout and as many files as desired. Files can be appended (-a). wc -l file1.txt file2.txt # show word count for both files ''wc'' aka 'word count' counts lines (-l), word (-w), (UTF-8) characters (-m), bytes (-c). ===== Filtering & Selecting ===== == tr == ''[[http://www.thegeekstuff.com/2012/12/linux-tr-command|tr]]'', aka translate characters, is a good supplement to ''sed'' when it comes to special characters. With ''tr'' newlines, spaces, and non-printing characters can easily be replaced, deleted or squeezed. Important arguments are delete (-d), squeeze (-s) and complement (-c). tr -d '[:space:]' < lorem.txt # delete all white-space from lorem.txt and print it to stdout tr -s '\n' ' ' < file.txt # join all lines of a file into a single line tr '[A-Za-z]' '[N-ZA-Mn-za-m]' # rot13 == cat & tac == ''cat'' and ''tac'' concatenate files (opposite of ''split''). ''cat'' is also often used to simply print a file to stdout to view its content or pipe it into other programs. cat file1 file2 #print file1 and then file2 to stdout cat -A file1 #show non-printing characters, tabs, line ends. ugly results for non-ASCII characters. tac file1 file1 #print file1 and then file2 to stdout, but reverse the line order for each file == sort == ''sort'' can be used to sort lines in one (or more) files. The ordering can be alphanumeric (default), numeric (-n), human-readabe numeric e.g. 2K 1M (-h),... The input can also be sorted reverse (-r) or randomized (-R). sort file1.txt file2.txt #print sorted lines from all input files sort -k 1.3 file.txt #sort file by the line-content starting with the third character sort -t ';' -k 3 file.csv #print lines of csv, sorted by third the column sort -u file.txt #sort file and only print unique lines (see also: ''uniq'') == shuf == Shuffle - the opposite of ''sort''. People on StackOverflow claim it's faster than ''%%sort --random-sort%%''. == uniq == With ''uniq'' you can filter (omit) or report adjacent repeated lines. uniq file.txt #filter adjacent repeated lines uniq -i -u file.txt #filter while ignoring case and only print unique lines uniq -c -d file.txt #filter, and only print duplicates and their count uniq -s 2 -w 4 file.txt #filter, but only compare 4 chars not including the first 2 for each line == head & tail == ''head'' and ''tail'' allow you to print the first / last lines or bytes of a file. head file.txt # print first 10 lines head -n 5 file.txt # print first 5 lines head -c 10 file.txt # print first 10 bytes (no word-option) tail -n 10 file.txt # print last 10 lines tail -n +2 file.txt # print everything from line 2 on (everything except first line) ''tail'' can also be used to "follow" a file, i.e. print changes to the screen in real time: tail -f file.txt # print last 10 lines and all lines that are appended to the file in the future tail -F file.txt # same as --retry -f (realizes, when the file to be tailed is deleted and created again) ==== Regular Expressions ==== Regular expressions are a very powerful tool to match strings of text. It comes in various flavours, the one of concern here are POSIX.2 regexes, which are documented in ''man 7 regex''. A great resource that also lists the slight differences between regular expressions in different programming languages,.. is [[http://www.rexegg.com|Rex Egg]] === Tools making heavy use of regular expressions === == awk == ''awk'' .. [[http://www.pement.org/awk/awk1line.txt|1liners]] == sed == ''sed'' aka 'stream editor' is commonly used to **replace strings** in lines of text files. See [[http://sed.sourceforge.net/sed1line.txt|this list of one-liners]] for more inspiration. sed sed s/abc/xxx/ file.txt #simple use-case: replace the first 'abc' in each line of file.txt with 'xxx' and print it to stdout sed s/abc/xxx/g file.txt #global replace: replace all 'abc' per line sed -i -e 's/a c/x x/g' -e 's/a/b/' file.txt # replace in-line (overwrite file.txt) with -i, and several expression with -e Two good tricks to create readable regexes are to (1) escape the whole regex with single quotes and (2) use a different separator than ''/'' when appropriate. See for yourself in this example of replacing all backslashes in a line with slashes. sed s/\\\\/\\//g count.txt #wtf sed s#\\\\#/#g count.txt #use any character as separator-char sed 's#\\#/#g' count.txt #avoid escaping for the shell. we still must escape the backslash for the regex. Advanced usage examples: sed -n 1~2p file.txt #only print odd lines (=print line 1 and then print each line at step 2) By default ''sed'' regular expressions are limited, e.g. do not support matching groups. Use the switch ''-E'' to activate extended regular expressions and enjoy matching groups: sed -E 's#(.*),(.*)#\1;\2#g' #replace a single comma with a semicolon However, even with extended expressions ''sed'' does not support non-greedy expressions as it is possible with PCRE (perl compatible regular expressions). A good resort is to simply use ''perl'' itself: perl -pe 's/.*thevalue="(.*?)".*/\1/g' file.txt == grep == ''grep'' is used to **print selected lines** of a file, that match the regex. grep regex file.txt #print lines containing the string 'regex' in file.txt grep '^[[:digit:]]' file.txt #print lines starting with a digit grep '[[:alpha:]]$' file.txt #print lines ending with an alphanumeric character grep '[1-3]a' file.txt #print lines containing 1a, 2a or 3a Flavours of grep: rgrep #same as grep -r: recursively execute grep for each file under a directory fgrep #same as grep -F: search for fixed strings, do not interpret the search string as regex egrep #same as grep -E: use extended regular expressions egrep '([1-3]a){2}' file.txt #print lines containing 1a, 2a or 3b exactly twice after each other egrep 'one|two' file.txt #print lines containing either 'one' or 'two' (using extended regular expressions) === Regular expression tipps === == Capture groups == If you want to insert text with regular expressions, [[http://www.rexegg.com/regex-capture.html|groups]] come in handy. A group is defined by parenthesis ''()'' in the find-expression. All that is matched by the expression within the parenthesis can be added to the replace-expression with ''${n}'' or ''\{n}'', where n is the group number. For example: rename 's/(^\d*)/${1}_insert/' 1234_abc.txt # renames file to: 1234_insert_abc.txt ===== CVS handling à la Database ===== expand file.txt #replace each tab with (up to) 8 spaces expand -t 4 file.txt #replace each tab with (up to) 4 spaces expand -t 8,16 file.txt #first replaced tab has last space at position 8, second tab at 16. Other tabs are replaced with a single space. ''expand'' replaces tabs with spaces. ''unexpand'' does the opposite. cut cut -c 3-5 file.txt #print characters 3 to 5 of each line cut -d ';' -f 1,3 csv.csv #print columns 1 and 3 of a CSV ''[[http://how-to.linuxcareer.com/learning-linux-commands-cut|cut]]'' can be used to cut bytes (-b), characters (-c) or fields of a tab-separated file (-f). Currently cut is buggy in Ubuntu 12.04, -c is treated as -b and so the tool breaks for UTF-8 characters. join a.csv b.csv #join two files on first field join -t ';' -j 2 a.csv b.csv #join two semicolon-separated files on the second field join -t ';' -1 1 -2 4 a.csv b.csv #join first field of a.csv on fourth field of b.csv join --header a.csv b.csv #joins and treats first line as header ''[[http://how-to.linuxcareer.com/learning-linux-commands-join|join]]'' is used to join e.g. two CSVs on a common column, which must be sorted. The default field separator is blanks, i.e. spaces or tabs. By default the result for a non-successful join (no equal fields) is empty, because unpairable lines are omitted. These can be printed with -a. ===== Internationalization ===== Text can come in various encodings. ''recode'', ''iconv'' and ''dox2unix'' convert file contents from one encoding to another, ''convmv'' converts filenames from one encoding to another. Common encodings encountered in Austria are: * ASCII: 7 bit (128 characters) and base for all other encodings in the list * ISO-8859-1 (latin1): old western european ASCII extension * ISO-8859-15 (latin9): slightly updated version of latin1 (e.g. with €) * CP-1252: code page used by German versions of Windows (superset of ISO-8859-1) * UTF-8: Unicode == recode - in-place recoding & filtering == ''recode'' can operate in two modes: * as in-place-recoding tool * as filter The 'request' typically looks like ''from..to'', where from and to consist of charset/surface. Charsets are e.g. UTF-8 or ISO-8859-1 but also ''HTML'', ''JAVA''. Common surfaces for line-ends are carriage returns ''/CR'' (Unix) or carriage return and line feed ''/CL'' (Windows), but you can also convert e.g. from/to base 64 (''/b64''), hexadecimal (''/x1''), decimal (''/d1'') or quoted printable (''/QP''). When from or to or the surface are ignored, the default is used. The default charset depends on the system locale, the surface on the charset. Some in-place examples: recode HTML file.txt # from html to default recode ..HTML file.txt # from default to hml recode UTF-8..HTML file.txt # from utf8 to hml In-place line-end switching: recode ../CR file.txt # convert to Unix line endings dos2unix file.txt # same recode ../CL file.txt # convert to Windows line endings unix2dos file.txt # same Filter example: cat file.txt | recode /b64 > newfile.txt # convert base64 encoded file to default encoding == iconv - stream-based recoding == iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt # stream-based recoding of file contents to UTF-8 == convmv - recoding of file names == By default this command only prints what it would do, use ''%%--notest%%'' when the results are as expected. convmv -f ISO-8859-1 -t UTF-8 --notest file ====== Power Management ====== Suspend / Hibernate with the package ''pm-utils'' pm-suspend # suspend to RAM pm-hibernate # suspend to disk Turn off screen xset dpms force off