This week will be an introduction to the UNIX terminal, Git for version control of your code and documents and a quick introduction to the cloud computing possibilities using Amazon EC2 clusters.
The UNIX terminal is more or less a necessity for super-users, and some of the tools that it offers (like sed, awk and grep) greatly enhances data manipulation capabilities.
Version control is commonly used tool for software development on larger projects, but its use is spreading into website-hosting, personal backup and code-sharing for research.
We will introduce the version control system Git and the repository hosting site Github. Finally we will look at Amazon EC2 instances for easy and fast cloud computing.
After this week, you are supposed to know:
- UNIX terminal
- How to use the following applications: chmod, find, grep, sed, apt-get, nano, cut, sort, uniq, head, tail, less, cat, ssh, wc, echo, man.
- What the following applications are for: awk, vi, emacs.
- How to use piping and redirection to combine commands.
- How to write and run simple bash-scripts.
- What environment variables are and how they are used.
- Git and Github
- About the following Git-commands: config, init, clone, add, rm, commit, status, checkout, stash, pull, push.
- How to set up a repository on Github and pushing and pulling from/to it.
- Amazon EC2
- How to launch an instance on Amazon EC2. How to close it, and how to reconnect to it.
- A few of the differences between the instance types offered on Amazon EC2
- Introduction to the Unix Shell: http://swcarpentry.github.io/shell-novice/
- Introduction to man and less: http://bogojoker.com/unix/fundamentals/man_and_less.html
- Introduction to apt-get: https://www.digitalocean.com/community/tutorials/how-to-manage-packages-in-ubuntu-and-debian-with-apt-get-apt-cache
- Introduction to chmod: http://hints.macworld.com/article.php?story=20001231152532966
- Introduction to environment variables: http://superuser.com/a/284351
- Introduction to Amazon EC2 here: http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud
- How to setup a free machine on Amazon EC2 here: http://www.nczonline.net/blog/2011/07/21/quick-and-dirty-spinning-up-a-new-ec2-web-server-in-five-minutes/
Git and Github
- Follow the interactive tutorial here: https://try.github.io/levels/1/challenges/1
- Or read about git here: http://rogerdudler.github.io/git-guide/ or here http://software-carpentry.org/v5/novice/git/index.html
Write a command that finds the 10 most popular words in a file.
Put this data (https://www.dropbox.com/s/d5c4x905w4jelbu/cars.txt?dl=0) into a file and write a command that removes all rows where the price is more than 10,000$.
Using this file (https://www.dropbox.com/s/tjv9pyfrd9ztx8r/dict?dl=0) as a dictionary, write a simple spellchecker that takes input from stdin or a file and outputs a list of words not in the dictionary. One solution gets 721 misspelled words in this Shakespeare file (https://www.dropbox.com/s/bnku7grfycm8ii6/shakespeare.txt?dl=0).
Consider using the command “comm”.
Launch a t2.micro instance on Amazon EC2. Log onto the instance, create some files and install some software (for example git).
(You have to enter your credit card to make an Amazon account. If you want to make sure you do not spend any money, you can remove your account when you are finished with the exercises. If you really don’t want to do this, you can use the GBar instead of Amazon.)
Create a few files locally on your computer. Create a new repository on Github and push your files to this repository. Log on to a t2.micro instance on Amazon EC2 and clone your repository there. Make some changes to the files, push them again and pull the changes on your local machine.
(If you did not make an Amazon EC2 account in Exercise 1.4, then you should push your files and pull them on the gbar.)