I have a massive data set of the U.S. census from 1994. It lists the surveyed person's age, education, marital-status, job, race, gender, capital-gain/loss, hours-per-week, native-country, and whether they make less than or more than 50k a year. What questions do you guys want me to find out? (The data set is from https://archive.ics.uci.edu/ml/datasets/adult)
Here's an example question with an example graph I made by myself "who is smarter, males or females" |dw:1570924234468:dw|
|dw:1570925095815:dw| every first-grade dropouts earned less than 50k :( sad
|dw:1570925188921:dw| Of course there's that one dude that wins the jackpot on investment I'm going to figure out who that point represents
Of course LOL
Is this even possible, 99 hours a week, WHAT |dw:1570925773153:dw|
Did you arrange all the data graphic or did you find it? I just want to know because it seems a bit interesting. Also wow, pretty cool data.
Yeah I'm using RStudio to make the graphics
thank you
Does it also find the data for you or do you need to find it yourself
I downloaded the data on my computer. Then I use RStudio to rearrange the data however I want (like only view rows that are male and less than 50 years of age) and with my new data, I can draw plots with it the code looks like this: ```r data %>% group_by(education_num,gender) %>% summarize(n=sum(population)) %>% ggplot(aes(education_num,n)) + geom_line(aes(color=gender)) + theme_bw() + labs(title="Count of Education Levels by Gender",caption="Figure 1.2",x="education level",y="count") + theme(axis.text.x = element_text(angle=50,hjust=1)) ``` and the graph for that would look like this: |dw:1570926247203:dw|
Ohh okay, thank you. Is it done on a terminal or its own platform?
On its own platform.
Ah okay.
what does the 2nd graph represents?? how would you interpret it? Perhaps scatter plot is not the best graph to represent this data?
did you label the horizontal axis or the software did ??
that was just me casually figuring out if any preschool-only educated people earned more than 50k. A scatter plot is the quickest to make so I just made it. the software automatically labels the axis but I can change the axis name
okay. There are some things that can be done to organize the graphs.
Join our real-time social learning platform and learn together with your friends!