The final part of 02807 is the project. The idea is to actually try and get your hands dirty and work on a larger project which includes a large dataset.
You have to get your project proposal accepted before you start.
You will have 4 weeks to work on the project. In the end you hand in a report about what you have been doing.
After 2 weeks, you hand in what you have created so far. This is to make sure that you get started, and to make sure that you get some feedback on your project along the way. What you hand in mid-way will also count towards your final grade, but the final project report will count twice as much as the mid-way project report.
- Hand in as an individual or in groups of 2 or 3 people.
- Hand in through Peergrade
- Hand in what you have after 2 weeks, and what you have after 4 weeks (the final project).
- Submit your idea for a project here https://docs.google.com/forms/d/e/1FAIpQLSfde0qUKpbo7DYVrIfSEmj0L1ukMQ80So863Pue9UhCfTYWkg/viewform and I will verify that the idea is proper for the project
- If I find that too many groups are working on the same idea, I will stop accepting it. In that case, the first people to submit it gets to do it.
The project will be evaluated on the following general points:
- Is the project interesting (are you doing something cool with your data)?
- Is the project relevant to the course (is what you are doing relevant in the context of computational tools for big data)?
- Does the project involve working with a large amount of data (are you actually processing a lot of data, or just a few small text files)?
- Is the project of proper technical depth (is it hard enough)?
- Is the report well-written (good explanations, figures, etc.)?
These criteria are the same for both the mid-way hand-in and the final hand-in.
Peergrade setup for the project
To make sure that you have time to work on your projects, and to give you more feedback we are doing the following setup:
- Only half of you will have to evaluate the midway projects
- The other half of you will have to evaluate the final projects
- I and the assistant teachers will also give some feedback on the midway projects
When peer grading opens for the midway project, you can see in Peergrade if you are assigned anything to evaluate or not.
Subjects and data for the project
The project should be relevant to the course, but not just be a subject that we already covered in the course. Think of the subject as “How can I apply what I learned?”. You have to work with a dataset of a proper size (aim for at least 10GB if you are in doubt).
You should go online to get large datasets, for example here: https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
You can also check out this article which is cool: http://blaze.pydata.org/blog/2015/09/16/reddit-impala/
and here is a lot of data from self-driving cars: https://github.com/udacity/self-driving-car