Data Science Course IFT6758

Home Lectures Labs Assignments Project Contact

Project Description

The goal of this project is to build a system for automatic recognition of the age, gender, and personality of social media users. When given as input users’ genderate content (e.g., text, image and relations), this system should return as output the age, gender and personality trait scores of that user.

Setup

To access a breif instruction to access the server click here.

Need more info to build your software for the project? check here.

Scoreboard

To see the score of your team, check the scoreboard.

You can find papers here that describe how other people have approached the same or very similar problems. To get a better understanding of the problem domain, it is highly recommended that you read one or more of these papers.

Project Grading

The project counts for 35% of your final grade. The project will be graded out of a total of 35 points as follows:

Requirements: Give an overview of the prediction task, and present statistical analysis of the data using visualization tools. Present the method that you use for each data source/task. Provide an overview of the prediction results you obtained by applying the machine learning methods on the public trainset. Note that your score on the scoreboard (Evaluation #5: November 4) should beat the baseline for atleast one of the tasks.

Requirements: Give an overview of the prediction task, and present statistical analysis of the data using visualization tools. Present the method that you use for each data source/task. You need to use all three sources in your software. Provide an overview of the prediction results you obtained by applying the machine learning methods on the public trainset. Note that your score on the scoreboard (Evaluation #10: December 16) should atleast beat the baseline for all three tasks.

Your grade for the progress updates is based on a 7 minute presentation in class (approx. 1-2 slides per each team member) and the results of your software so far.

Deliverables

Group report

(1 upload per team) Provide a write-up of your research in the form of an academic paper that could be submitted to a conference on data mining/machine learning. Your paper should be self-contained. Everyone who has read the assigned reading materials from the course should be able to read and understand your paper. That means that in your paper you can be brief about machine learning methods that are described in the assigned readings, but that you need to provide sufficient details about the problem domain, the dataset, as well as about any other machine learning methods that you used that were not covered in class. The reasons for this are: (1) a description of the problem domain and the dataset will allow to share your paper with interested parties who have not taken the course but who have general knowledge of machine learning; (2) a description of machine learning methods not covered in class will allow to evaluate whether you truly understood those methods instead of treating them as a black box. Your paper can for instance be divided into sections as follows (but if another structure works better for you, don’t feel restricted to the one below):

Formatting guidelines: up to 8 pages, double column, ACM Proceedings format. In case you need more than 8 pages, consider splitting your material in a main paper and an appendix.

Individual report

(1 upload per student) You will also submit a brief individual report (at most one page), which will:

The purpose of the individual report is to facilitate fair grading and to allow the instructor to understand well what you learned from the project.