This page covers basic usage of / working with Linux.
A great resource are the man pages but also the official documentation of the GNU Coreutils
type
is a builtin command of bash, that shows the type of a command, e.g. binary, script, alias, builtin command,..
Depending on the result you must use the respective help system.
type ls # show type of ls type -a kill # show all commands with the name kill (=one program & one builtin!)
help
is a bash builtin that gives you help for bash builtins (try help help
).
help # list all builtin bash commands help type # get help for builtin command 'type'
man
shows you the man(ual) page for nearly every Linux program but also config files or system calls. Man pages are divided in sections, the most important ones are programs (1), config files (5) and sysadmin tools (8).
Man pages are preprocessed with the program groff
and displayed using the program less
. Use the arrow keys to navigate, type /searchphrase
and enter to search, and type q
to quit.
man man # view man page of man itself man passwd # view man page of programm passwd (section 1) man 5 passwd # view man page of /etc/passwd (section 5)
whatis
shows shows man pages whose names contain your search phrase.
whatis man -f man -aw #show full path to man file instead of name only
apropos
shows man pages relevant to a topic by searching through both the “Name” and “Description” sections of the man page database.
apropos man -k
info
shows info page for many GNU programs. Sometimes info
gives more in-depth information than man
(e.g. info ls). Type h
in info
to get a help on how to navigate with the keyboard.
info
whereis
shows the paths of a program, its config files and man pages.
which
only shows the full path of a program
whereis ls which ls
bash
is a commonly used shell a.k.a. command-line interpreter a.k.a. terminal.
It is used to run programs, scripts, and basically interact with the system.
Here configuration, basic usage and bash scripting is explained.
cd /etc/init.d # change into directory /etc/init.d cd or cd ~ # change into home directory cd .. # change into parent directory cd - # change into last directory you visited before
Note, that this is a shell builtin, not a program.
## list files, check free / used disk space ls [<dir>] # list files (in dir) ls -lah [<dir>] # list *all* files in *long format* (permissions, owner, size in *human readable format*,..) ls -ld [<dir>] # info for directory itself
By default ls
sorts files by name in ascending order, this can of course be changed using either short or long forms:
ls -lr # reverse order ls -lS # ls -l --sort=size (biggest first) ls -lX # ls -l --sort=extension (ascending) ls -lt # ls -l --sort=time (modification time, newest first) ls -lut # ls -l --sort=time --time=atime (last access time, newest first) ls -lct # ls -l --sort=time --time=ctime (creation time, newest first)
ls
can also be used to search in the current directory. You can of course make use of bash wildcards here.
ls *.txt # show all text files in current directory ls -d *dir* # show all directories in current directory containing 'dir'
du
always calculates the size of a directory recursively. The options limit verbosity of how many lines are printed, but the totals do not change.
du # recursively print disc usage of all directories below the current directory (+ the current directory) du -sh <file> # human readable summary of total disk usage, or current directory if no file is given du -c <file1> <file2> # grand total is displayed in addition to usage for two directories du -h -d <depth> # total disk usage dir and subdirs (show usage for each subdir as well) du --exclude=*.mp3 # disk usage except for mp3s du -a # also print usage for files du -s # only print a summary-line (disc usage of current directory)
df
is (kind of) the opposite of du
, it shows remaining size (or inodes) on a partition.
df # show free space on all mounted filesystems df -h # use human-readable format df -i # show used / free nr of inodes df -h <file> # show free space on the file system where file lies
Great graphical insights into disc usage are given by programs that analyze a whole directory tree and let you easily identify files or directories that take up the most space:
ncdu
is a text-based (ncurses) program, filelight
a KDE-based GUI program and baobab
aka Disk Usage Analyzer
is the GNOME equivalent.
This tool determines the file type (and encoding for text files) by analyzing its contents.
file myfile # print file type file -i myfile # print MIME type file -bi myfile # print MIME type (without prepending file name)
touch file.txt # creates an empty file using touch > file.txt # creates an empty file using bash redirection mkdir directory # creates an empty directory mkdir -p parent/subdir # create the whole directory tree (p = parent)
Actually, touch
was written to change times (by default: atime and mtime - access & modification time) of a file. By default it also creates files if they do not exist yet, the parameter -c
suppresses file creation.
touch file.txt # set atime and mtime of file.txt to current time touch --date="2012-05-01 18:12" file.txt # set atime and mtime of file.txt touch --time=atime --date="2012-05-01" file.txt # only set atime of file.txt
cp srcfile targetfile # copy files (dereferences symlinks automatically!) cp -r srcdir targetdir # resursively copy directories (dereferences symlinks automatically!) cp -a or cp -Rpd # archive (backup) files: recursive, preserve symlinks, owner, timestamps, mode
Some important parameters:
-L
: always dereference symlinks-P
: do not dereference symlinks-d
: do not dereference and preserve symlinks-p
: preserve owner, timestamps, mode (permissions)-R
or -r
: recursive copy-u
: update - copy only if source file is newer than destination or missing
The program mv
moves files or directories from one location to another. If both locations are on the same partition, it is a simple rename, otherwise the content is copied and deleted.
mv src target # move a file or directory.
When moving a file to an existing target that is also a file the target is overwritten without any notice. If the existing target is a directory, the file or directory to be moved is moved into the directory. It is therefore advisable to be careful when using mv
with files.
mv -i srcfile targetfile # interactive mode (asks if a file should be overwritten) mv -n srcfile targetfile # noclobber mode (does not overwrite files - but also does not print a message)
A practical tool for bulk renaming is the perl program rename
. There you can use the full power of perl regular expressions. More examples.
rename -n s/old/new/ * # dry-run of what would happen when renaming all files and directories in the current directory rename -v s/old/new/ * # actually rename and print all changes rename 's/\.htm$/\.html/' *.htm # complex example: rename all .htm files to .html
To make it easier to remember: as with cp
, the first argument is the part that already exists.
ln src target or cp -l src target # create a hardlink (both file reference the same inode) ln -s src target or cp -s src target # create a symbolic link (symlink)
rm file.txt # remove a file rm -r directory # recursively remove a directory rmdir directory # remove an empty directory rmdir -p parent/sub # remove an empty directory tree
scp
can copy files between hosts (also between two remote hosts)
Recursively copy localDir
to a remote host:
scp -r localDir user@host:remoteDir
To copy between two remote hosts either the source host must have credentials to log into the target host or you use the issuing host as third party via -3
(recommended).
scp -3 -r user1@host1:remoteDir2 user2@host2:remoteDir2
rsync
can keep two directories synchronized intelligently by only syncing changes, which makes it useful for e.g. remote backups.
Simple local usage, where localDir
and all its contents will be copied to otherLocalDir/localDir
:
rsync -a localDir otherLocalDir
When copying remotely by default ssh
is used. Quick setup:
# @destination (where files are copied to) # set up ssh server (listens for incoming connections) sudo apt-get install openssh-server # @source ssh-keygen ssh-copy-id user@remotehost rsync -av localdirectory user@remotehost:directory
Each file in Linux has
Note, that for root
can not only change file permissions but they are also do not apply (except for the execution of files)! E.g. root
can can always write/delete write-protected files or create files in in a write-protected directory. To protect files / directories from unwanted changes by root
attributes can be used.
The output of ls -l
e.g. looks like drwxrwxrwx
. The first character identifies the file type, then there follow three access-tuples (for owner, group, others).
List of possible values for the file type (first character):
With chown
both the owner and the group can be changed. Simple usage examples:
chown myuser file.txt # change the owner of file.txt to 'myuser' chown myuser:mygroup file.txt # change the owner of file.txt to 'myuser' and the group to 'mygroup' chown :mygroup file.txt # only change the owner of file.txt to 'mygroup' chown -R myuser directory # change the owner of directory recursively to 'myuser'
If a symbolic link is given, by default the referenced file is changed, not the link itself. Use -h
to avoid dereferencing and to change the owner of the symlink itself.
In recursive mode symbolic links not traversed by default (-P
). With -L
every encountered symlink can be traversed. ( ausprobieren)
To only change the owner for files owned by a certain user and/or group use the from
option with the user:group
syntax:
chown -R --from=myuser:mygroup otheruser:othergroup directory
A combination of the letters ugoa controls which users' access to the file will be changed:
The letters rwxXst select file mode bits for the affected users:
The letter for affected users and the ones for the mode can be combined with one of =+-
. (equals, add, remove)
chmod o+rw file # add permissions for other (everybody) to read and write file
Explicit setting of all permissions is also possible in four octal numbers (e.g. 0644 or 0755 are common). The first represents the suid/sgid/sticky bits, the other three permissions for ugo.
chmod 0644 file # do not set suid/sgid/sticky bit but allow owner rwx and group/other rw
Some advanced examples:
chmod u+srw,g+r file or chmod 4740 file # set suid, read and write access for user, # only read acces for group, nothing for other chmod -R a+awX <dir> # recursively give read/write access to a directory tree # (executable bit is only set for directories - and files where # one of ugo already has the executable bit set)
chattr - change file attributes (a=append only, i=immutable, s=secure deletion,..) lsattr file # view file attributes chattr +a file # allow only appends to file (requires root to (un)set)
When users execute executables with those bits, the effective suid/sgid of the process is the owner of file (This is why regular useres can edit /etc/shadow
with /bin/passwd
)
All files will get the same group as directory, and all subdirectories will inherit the sgid bit.
In sticky directories only owners of a file/directory and root can delete it (as in /tmp)
see also: http://www.bashguru.com/2010/03/unixlinux-advanced-file-permissions.html
The umask determines which file permissions are set for files and directories when they are created. Note, that the umask definition is negative, i.e. will be subtracted from the default permission (e.g. 666 for files). A typical umask is 022: 666 - 022 = 644, which means the owner can read and write to a file, group and other can only read.
umask
is a bash built-in to set & view the current umask of a user. Typically it is invoked in /etc/profile (or a file referenced from there).
umask # print currently active umask (octal) umask -S # print currently active umask (symbolic output) umask 0xxx # set a umask (octal)
Two important tools can be used for searching files in Linux: find
and locate
.
The big difference between those two: find
searches the system 'live', locate
uses a database that is typically updated only once a day.
In the simplest invocation find
just prints all files in the current directory and all its subdirectories (similar to tree
).
find # print all filenames in dir + subdirs find dir1 dir2 # print many dirs + subdirs
To filter the files an “expression” must be added as last parameter. An expression can be quite complex and can consist of options, tests and actions. The most straight-forward test is -name
:
find dir1 dir2 -name "*.jpg" # find all .jpg files in both directories find dir1 dir2 -iname "*.jpg" # same, but ignore case (e.g. also find .JPG)
Some examples for options
find dir -maxdepth 1 # find all files in dir (but not its subdirs) find dir -xdev -name "*.jpg" # find files only in the current file system # (i.e. does not search NFS shares or /proc)
Some examples for tests:
find dir -size +10M # find files bigger than 10 megabytes find dir -size -50c # find files smaller than 50 bytes (c=byte, k=kilobyte, M=Megabyt, G=Gigabyte) find /bin -perm -u=s # find all executables in /bin that have the suid bit set for its owner find dir -perm 644 # find files with permissions being exactly 0644 find dir -atime -3 # find files accessed in the last two days find dir -cmin +10 # find files created more than 10 minutes ago (for modified files: -mmin/-mtime) find /usr/bin -executable -type f # find all executable files in /usr/bin
Some examples for actions: delete files, execute commands
find dir -exec command {} \; # execute a command for each of the found file - {} will be replaced with the filename find dir -exec command {} + # execute one command for all of the found files - {} will be replaced with the filenames find dir -execdir command {} \; # same as -exec, but the execution directory is the directory of the file, not the starting directory find dir -delete # delete found files
http://linux.101hacks.com/linux-commands/find-command-examples/
locate & slocate..
For searching contents of files grep
can be used:
# find all files in or below the current directory that contain "searchString" grep -r searchString .
split
can split one file into several files. Splitting can be done by lines (-l N) or into a nr of equal chunks (-n N). By default files are named x__: x plus two suffixes iterating through the alphabet, from xaa to xzz. The resulting names can changed: The prefix is an optinal argument after the file name, the nr of suffixes is set with -a.
split file.txt # split file.txt in chunks of 1000 lines and name them xaa, xab,.. cat xa* > merged.txt # merge split file again split -a 3 -l 1 big.csv splitfile # create files with the names 'splitfile***', containing one line each split -b 700M archive.tar.gz # split an archive into 700MB pieces
csplit
splits files according to a context (regex), hence the c.
csplit file.txt "/regex/" # split file at first occurrence of regular expression csplit file.txt "/regex/" "{*}" # split file at all occurrences regular expression
Merging files is handled by cat
(concatenate)
cat file1 file2 file3 > mergedfile
tar -cf archive.tar a b c # create tar archive of file/folders a, b, c tar -tf archive.tar # list contents of archive (actually: tests integrity of archive) tar -xf archive.tar -C destination # extract content of tar archive into folder destination (otherwise: current directory)
tar
creates uncompressed archives by default.
For compressing files the zip formats gzip (.gz), bzip2 (.bz2) and xz (.xz, which uses LZMA compression) are commonly used.
Gzip is the most commonly used one, but according to this in-depth comparison .xz is the superior format - it is fast to decompress, achieves high compression rates and also has reasonable compression times until level 2 or 3.
The following commands (un)compress a single file and then remove the old (un)compressed file.
gzip my.txt # creates the gzipped file my.txt.gz and remove my.txt gunzip my.txt.gz # recreates my.txt and removes my.txt.gz; same as gzip -d
The commands bzip2
+ bunzip2
and xz
+ unxz
work in the same fashion.
For all zip formats there is a cat
-style alias which allows for the creation of an uncompressed data stream:
zcat # stream uncompressed gzip file to stdout, same as gzip -cd bzcat xzcat # same as xz --decompress --stdout
Often you want to compress archives as e.g. tar.gz files. This can either be done with separate calls to tar
and gzip
or directly with tar
:
tar -czf archive.tar.gz a b c # create gzipped archive tar -cf archive.tar a b c && gzip archive.tar # does the same tar -xzf archive.tar.gz # extract gzipped archive gunzip archive.tar.gz && tar -xf archive.tar # does the same
For taring or untaring zipped archives use the following flags: gzip (-z), bzip2 (-j), or xz (-J).
tar -tf my.tar # print list of contents
cpio
cpio
can copy files to and from archives
System information can be gathered from many places. Apart from various programs the two virtual filesystems procfs
and sysfs
are quite important. They allow us to peek into the kernel by providing /proc
and /sys
. There information about processes and other system information is presented in a hierarchical file-like structure - meaninging we can use it like every other file in Linux with our favorite tools.
Get help for procfs
with man proc
or on Wikipedia.
Sysfs
is a successor of procfs and exports kernel data structures, their attributes, and the linkages between them to userspace. Documentation is here.
With procfs
we can get info about the CPU, Interrupts (IRQ) and IO ports of devices can be viewed via procfs
- the latter two only if the kernel module for devices is loaded.
cat /proc/cpuinfo # CPU cat /proc/scsi/scsi # available SCSI devices cat /proc/interrupts # IRQs cat /proc/ioports # IO ports
These programs also find devices without working or loaded kernel module by live probing the hardware. If the standard output of these programs is not informative enough, try the verbose flag (-v or even -vv).
lspci # list PCI devices - live information about PCI buses in the system and devices connected to them. lsusb # list USB devices lshw # extract detailed information on the hardware configuration # (basically every possible piece of hardware)
Advanced lshw
lshw -html > /tmp/html # generate a nice html to view it in a browser lshw -short # quick overview lshw -class network # filter by class (see -short for classes)
The tool get-edid
(in the package read-edid) can help to identify monitors (which are not included in lshw or lspci):
sudo apt-get install read-edid sudo get-edid
GUI applications available include kinfocenter
or hardinfo
.
free -m # quick overview of total, free and used memory (and swap) in megabytes cat /proc/meminfo # in-depth look at memory stats via procfs
Install the package lm-sensors
to view sensor values like temperature or fan rpm:
sensors-detect # set up which sensors to use sensors # view all sensore values
uname -a # print kernel version, platform (i.e. 32/64 bit),.. lsb_release -a # print distribution information, e.g. release name and version cat /etc/os-release # text file containing about the same information as in lsb_release cat /etc/issue # text file containing login greeting (typically the release name)
Get kernel-related information:
cat /var/log/dmesg # kernel messages (also the ones during boot) dmesg # print all kernel messages in kernel ring buffer (look for error messages there; and local printers) cat /proc/cmdline # kernel boot time arguments lsmod # show the status of modules in the Linux Kernel cat /proc/modules # loaded kernel modules (see also: lsmod, /etc/modules) modinfo # get information about kernel modules (e.g. parameters, version) cat /lib/modules/<kernelversion>/modules.dep # available kernel modules lspci -v # show hardware and which kernel driver they use
Programs & config files to load or unload kernel modules:
modprobe # program to add and remove modules from the Linux Kernel (handles dependencies) insmod # simple program to insert a module into the Linux Kernel rmmod # simple program to remove a module from the Linux Kernel depmod # program to generate modules.dep cat /etc/modules.conf # config file specifying modules loaded at startup
Note: in Ubuntu 12.04 the file /etc/modules.conf
seems to be replaced by the combination of /etc/modules
and /etc/modprobe.d/*
.
Other system information:
cat /var/log/messages # system log - after start of the logging daemon (since Ubuntu 11.04: /var/log/syslog) cat /var/log/boot.log # system log - also before start of the logging daemon (written by dmesg after boot?)
cat /proc/filesystems # supported filesystems (currently loaded kernel modules) cat /proc/swaps # current swap partitions cat /etc/mtab # currently mounted partitions (the same output as when calling mount without parameters) cat /etc/fstab # config file for mountpoints
These programs give (slightly different) information about the currently logged in users (and more)
w # currently logged in users & what they are doing (+uptime as first line) who # currently logged in users
who -b # date+time of last system boot uptime # current time, how long the system has been running, how many users are currently logged on # and the system load averages for the past 1, 5, and 15 minutes cat /proc/uptime # uptime of the system (seconds), and the amount of time spent in idle process (seconds)
date
is a versatile tool for formatting dates / times and setting the system time (see also).
Print the current time in different formats
date # default - a human readable format date -I # only the date in ISO 8601 date -Iseconds # date and time in ISO 8601 date +%Y-%m-%dT%H%M # custom format useful for scripts, e.g. 2019-08-14T1129
Convert unix timestamps
date -d @1278923870 # print the unix timestamp in a human readable format
more file.txt # page through contents of file less file.txt # page through contents of file more conveniently
paste file1 file2 # view lines of two files next to each other, separated by a tab paste -d ';' a.csv b.csv # merge the columns of two .csv files line by line
paste
allows viewing files next to each other i.e. veritcally concatenate them. This is useful to e.g. merge .csv files or as a simple drop-in for diff
. The delimiter can be changed with -d.
nl file.txt # view content of file with line numbers nl -i 5 -s '|--|' sort.txt # increment line number by 5 and display the string |--| between line number and line
nl
numbers lines. The width of the line-number can be adjusted (-w N).
fmt lorem.txt # reformat text to 75 characters per line
fmt
does the same as dynamic word wrap in modern editors. Paragraphs are reformatted to a character width (-w N), optionally with a different indentation for the first line per paragraph (-t).
rev lorem.txt # reverse each line and print it to stdout
rev
reverses lines of text
pr lorem.txt # create pages with 66 lines including a header (in one text file9 pr -2 -l 50 lorem.txt # 2-column layout, only 50 lines per page
pr
converts text files for printing and should be combined with fmt
because it takes lines as-is.
tee # write stdin to stdout echo 'hello' | tee file.txt # writes 'hello' to stdout and into file.txt
tee
writes it input to stdout and as many files as desired. Files can be appended (-a).
wc -l file1.txt file2.txt # show word count for both files
wc
aka 'word count' counts lines (-l), word (-w), (UTF-8) characters (-m), bytes (-c).
tr
, aka translate characters, is a good supplement to sed
when it comes to special characters. With tr
newlines, spaces, and non-printing characters can easily be replaced, deleted or squeezed. Important arguments are delete (-d), squeeze (-s) and complement (-c).
tr -d '[:space:]' < lorem.txt # delete all white-space from lorem.txt and print it to stdout tr -s '\n' ' ' < file.txt # join all lines of a file into a single line tr '[A-Za-z]' '[N-ZA-Mn-za-m]' # rot13
cat
and tac
concatenate files (opposite of split
). cat
is also often used to simply print a file to stdout to view its content or pipe it into other programs.
cat file1 file2 #print file1 and then file2 to stdout cat -A file1 #show non-printing characters, tabs, line ends. ugly results for non-ASCII characters. tac file1 file1 #print file1 and then file2 to stdout, but reverse the line order for each file
sort
can be used to sort lines in one (or more) files. The ordering can be alphanumeric (default), numeric (-n), human-readabe numeric e.g. 2K 1M (-h),… The input can also be sorted reverse (-r) or randomized (-R).
sort file1.txt file2.txt #print sorted lines from all input files sort -k 1.3 file.txt #sort file by the line-content starting with the third character sort -t ';' -k 3 file.csv #print lines of csv, sorted by third the column sort -u file.txt #sort file and only print unique lines (see also: ''uniq'')
Shuffle - the opposite of sort
.
People on StackOverflow claim it's faster than sort --random-sort
.
With uniq
you can filter (omit) or report adjacent repeated lines.
uniq file.txt #filter adjacent repeated lines uniq -i -u file.txt #filter while ignoring case and only print unique lines uniq -c -d file.txt #filter, and only print duplicates and their count uniq -s 2 -w 4 file.txt #filter, but only compare 4 chars not including the first 2 for each line
head
and tail
allow you to print the first / last lines or bytes of a file.
head file.txt # print first 10 lines head -n 5 file.txt # print first 5 lines head -c 10 file.txt # print first 10 bytes (no word-option) tail -n 10 file.txt # print last 10 lines tail -n +2 file.txt # print everything from line 2 on (everything except first line)
tail
can also be used to “follow” a file, i.e. print changes to the screen in real time:
tail -f file.txt # print last 10 lines and all lines that are appended to the file in the future tail -F file.txt # same as --retry -f (realizes, when the file to be tailed is deleted and created again)
Regular expressions are a very powerful tool to match strings of text. It comes in various flavours, the one of concern here are POSIX.2 regexes, which are documented in man 7 regex
.
A great resource that also lists the slight differences between regular expressions in different programming languages,.. is Rex Egg
awk
.. 1liners
sed
aka 'stream editor' is commonly used to replace strings in lines of text files. See this list of one-liners for more inspiration.
sed sed s/abc/xxx/ file.txt #simple use-case: replace the first 'abc' in each line of file.txt with 'xxx' and print it to stdout sed s/abc/xxx/g file.txt #global replace: replace all 'abc' per line sed -i -e 's/a c/x x/g' -e 's/a/b/' file.txt # replace in-line (overwrite file.txt) with -i, and several expression with -e
Two good tricks to create readable regexes are to (1) escape the whole regex with single quotes and (2) use a different separator than /
when appropriate. See for yourself in this example of replacing all backslashes in a line with slashes.
sed s/\\\\/\\//g count.txt #wtf sed s#\\\\#/#g count.txt #use any character as separator-char sed 's#\\#/#g' count.txt #avoid escaping for the shell. we still must escape the backslash for the regex.
Advanced usage examples:
sed -n 1~2p file.txt #only print odd lines (=print line 1 and then print each line at step 2)
By default sed
regular expressions are limited, e.g. do not support matching groups. Use the switch -E
to activate extended regular expressions and enjoy matching groups:
sed -E 's#(.*),(.*)#\1;\2#g' #replace a single comma with a semicolon
However, even with extended expressions sed
does not support non-greedy expressions as it is possible with PCRE (perl compatible regular expressions). A good resort is to simply use perl
itself:
perl -pe 's/.*thevalue="(.*?)".*/\1/g' file.txt
grep
is used to print selected lines of a file, that match the regex.
grep regex file.txt #print lines containing the string 'regex' in file.txt grep '^[[:digit:]]' file.txt #print lines starting with a digit grep '[[:alpha:]]$' file.txt #print lines ending with an alphanumeric character grep '[1-3]a' file.txt #print lines containing 1a, 2a or 3a
Flavours of grep:
rgrep #same as grep -r: recursively execute grep for each file under a directory fgrep #same as grep -F: search for fixed strings, do not interpret the search string as regex egrep #same as grep -E: use extended regular expressions egrep '([1-3]a){2}' file.txt #print lines containing 1a, 2a or 3b exactly twice after each other egrep 'one|two' file.txt #print lines containing either 'one' or 'two' (using extended regular expressions)
If you want to insert text with regular expressions, groups come in handy. A group is defined by parenthesis ()
in the find-expression. All that is matched by the expression within the parenthesis can be added to the replace-expression with ${n}
or \{n}
, where n is the group number. For example:
rename 's/(^\d*)/${1}_insert/' 1234_abc.txt # renames file to: 1234_insert_abc.txt
expand file.txt #replace each tab with (up to) 8 spaces expand -t 4 file.txt #replace each tab with (up to) 4 spaces expand -t 8,16 file.txt #first replaced tab has last space at position 8, second tab at 16. Other tabs are replaced with a single space.
expand
replaces tabs with spaces. unexpand
does the opposite.
cut cut -c 3-5 file.txt #print characters 3 to 5 of each line cut -d ';' -f 1,3 csv.csv #print columns 1 and 3 of a CSV
cut
can be used to cut bytes (-b), characters (-c) or fields of a tab-separated file (-f). Currently cut is buggy in Ubuntu 12.04, -c is treated as -b and so the tool breaks for UTF-8 characters.
join a.csv b.csv #join two files on first field join -t ';' -j 2 a.csv b.csv #join two semicolon-separated files on the second field join -t ';' -1 1 -2 4 a.csv b.csv #join first field of a.csv on fourth field of b.csv join --header a.csv b.csv #joins and treats first line as header
join
is used to join e.g. two CSVs on a common column, which must be sorted. The default field separator is blanks, i.e. spaces or tabs.
By default the result for a non-successful join (no equal fields) is empty, because unpairable lines are omitted. These can be printed with -a.
Text can come in various encodings. recode
, iconv
and dox2unix
convert file contents from one encoding to another, convmv
converts filenames from one encoding to another.
Common encodings encountered in Austria are:
recode
can operate in two modes:
The 'request' typically looks like from..to
, where from and to consist of charset/surface. Charsets are e.g. UTF-8 or ISO-8859-1 but also HTML
, JAVA
. Common surfaces for line-ends are carriage returns /CR
(Unix) or carriage return and line feed /CL
(Windows), but you can also convert e.g. from/to base 64 (/b64
), hexadecimal (/x1
), decimal (/d1
) or quoted printable (/QP
).
When from or to or the surface are ignored, the default is used. The default charset depends on the system locale, the surface on the charset.
Some in-place examples:
recode HTML file.txt # from html to default recode ..HTML file.txt # from default to hml recode UTF-8..HTML file.txt # from utf8 to hml
In-place line-end switching:
recode ../CR file.txt # convert to Unix line endings dos2unix file.txt # same recode ../CL file.txt # convert to Windows line endings unix2dos file.txt # same
Filter example:
cat file.txt | recode /b64 > newfile.txt # convert base64 encoded file to default encoding
iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt # stream-based recoding of file contents to UTF-8
By default this command only prints what it would do, use --notest
when the results are as expected.
convmv -f ISO-8859-1 -t UTF-8 --notest file
Suspend / Hibernate with the package pm-utils
pm-suspend # suspend to RAM pm-hibernate # suspend to disk
Turn off screen
xset dpms force off