Skip to content

Commit 3cab21c

Browse files
authored
Merge pull request #44 from ArshnoorKaur21/main
Updated the existing documentation readme.md, proposal.md and created a new folder Documentation
2 parents bd07557 + db8d92a commit 3cab21c

File tree

6 files changed

+87
-115
lines changed

6 files changed

+87
-115
lines changed

Documentation/ML_algorithms.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"This readme file provides information about ML algorithms."
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"This readme file explains the machine learning models used in the project."
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"This readme file provides information about visualization techniques."

Documentation/code_analysis.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"This readme file provides information about code analysis."

PROPOSAL.md

Lines changed: 57 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,63 @@
11
# Project Proposal
22

3-
## Finding Insights from Stackoverflow Developer Survey
3+
## Finding Insights from Stack Overflow Developer Survey
44

5-
Stack overflow is a professional community for developers, Stackoverflow conducts a survey every year the collected data from 2011 has been available for open source on the web with the latest dataset 2020 released on March 5th, 2021. If the dataset analysed professionally using modern tools, would enable us to answer real-world questions effectively. The dataset has covered 275 questions in total.
5+
Stack Overflow is a professional community for developers, conducting an annual survey. The collected data from 2011 onwards has been available for open source on the web, with the latest dataset released in 2020. Analyzing this dataset professionally using modern tools would enable us to answer real-world questions effectively. The dataset includes responses to 275 questions.
66

77
### Project Goal:
88

9-
1. To perform Analysis on 3 years Stackoverflow Dataset and get insights.
10-
2. To perform Data Analysis and answer the below questions.
11-
+ Impact of igher education on salary of the surveyed developers.
12-
+ Impact of education/experience/responsibilities on gender inequalities.
13-
+ Impact on participation rate due to different ethnicity.
14-
+ To find whether there is any difference between men and women's income.
15-
+ Impact on the increase in popularity of a language in the current year due to developer’s interest in the previous year.
9+
1. **Perform Analysis on 3 years of Stack Overflow Dataset:** Extract insights from the data.
10+
2. **Data Analysis Goals:** Answer the following questions:
11+
- What is the impact of higher education on the salary of surveyed developers?
12+
- How do education, experience, and responsibilities affect gender inequalities?
13+
- How does ethnicity impact participation rates?
14+
- Is there a difference in income between men and women?
15+
- How does the previous year's interest in a language affect its popularity in the current year?
16+
3. **Data Visualization Goals:**
17+
- Identify the most commonly used language.
18+
- Analyze the distribution of surveyors based on their developer roles.
19+
- Explore factors affecting job satisfaction.
20+
- Predict the growth of languages for upcoming years based on survey answers.
21+
- Provide insights for IT environment, hiring employees, job seekers, and building a solid résumé.
1622

17-
3. To perform data visualization on
23+
### Data Source and Background
1824

19-
- The most commonly used language.
25+
The dataset is sourced from the annual Stack Overflow developer survey, covering responses from developers in 180 countries. The data range from 2011 to 2020, with the focus being on the last 3 years. Respondents primarily come from the US, India, and EMEA regions, with a background in developer/coding experience. The dataset includes survey data gathered from 180 countries, with responses ranging from "Not at all important" to "Very important" and "Not at all satisfied" to "Very satisfied."
2026

21-
- Distribution of surveyors based on their developer role.
27+
### Data Format
2228

23-
- Factors affecting Job satisfaction.
29+
The data is in CSV format, consisting of 252,199 observations and 62 variables.
2430

25-
- Predicting the growth of languages for upcoming years based on the survey answers.
31+
### Projected Work for Insights
2632

27-
###### The Insights can be used to provide information regarding IT environment, hiring employees and job seekers and build a solid résumé.
33+
#### Data Wrangling
2834

29-
### Data Source and Background
35+
- **Dealing with Null Values:** Handle unanswered questions marked as ‘NA’ or ‘Not Applicable’ to ensure precise analysis.
36+
- **Data Conversion/Manipulation:** Convert data for analysis, considering that respondents answered the survey through radio buttons rather than yes or no patterns (Univariate analysis).
3037

31-
The dataset is very diverse and came from a [Stackoverflow developer survey](https://insights.stackoverflow.com/survey/?_ga=2.208907280.304952146.1616422967-1864686930.1616422967) with 275 questions answered from 180 countries. Stackoverflow has data collected through surveys from 2011 to 2020, but for the project, the purpose is to analyze the data of the last 3 years. The people who completed the survey mostly from the US, India, and EMEA regions. The majority of the survey respondents had the background of developer/ coding experience. The data are available in the CSV format ranging from 40 to 150 MB with data of 1.5 Lakh survey participants.The dataset includes survey data gathered from 180 countries, the response ranges from Not at all important to very important/ Not at all satisfied to very satisfied.
38+
#### Techniques Expected to Use in the Project
3239

33-
### Data Format
40+
- ML Algorithms: Utilize algorithms like Random Forest, KNN, AUC for classification problems, logistic regression, and linear regression.
41+
- Data Visualization: Employ data visualization techniques for better understanding and presentation of insights.
42+
- Parameter Analysis: Analyze parameters to fine-tune models and improve accuracy.
3443

35-
The data is in a schema CSV file that consists of 252,199 observations and 62 variables.
44+
#### Project Plan
3645

37-
### Projected work needs to be done for Insights.
46+
**Week 8:** Project Base Setup
47+
- Source control setup on [GitHub](https://github.com/Sanjayviswa/Stackoverflow_survey_Analysis)
48+
- Project Management using tools like MS Project
49+
- Complete Data Wrangling & Basic Analysis
3850

39-
###### Data Wrangling
51+
**Week 10:** Baseline Model Building
52+
- Implement algorithms and build baseline models
4053

41-
**Dealing Null Values**: As this is a developer survey and few questions left unanswered by the respondents as ‘*NA*’ or ‘*Not Applicable*’ so dealing with null values is important to get precise information. Data conversion/ manipulation is also required, as the developer responded to the survey through radio buttons rather than yes or no pattern(Univariate analysis).
54+
**Week 11:** Model Evaluation
55+
- Run tests and evaluate the performance of models
4256

43-
###### Techniques expect to use in the project
57+
**Week 12:** Finalization
58+
- Prepare video presentation summarizing the analysis and insights
4459

45-
Planning to use ML Algorithms like Random, may include, KNN, AUC for classification problems, training model, logistic regression,data visualization, parameter analysis, Linear Regreesion, Root Mean square.
60+
#### Additional Technical Details
4661

4762
> Linear regression(RFE techniques)
4863
@@ -53,20 +68,28 @@ $$
5368
> Root Mean Squared Error Calculations
5469
5570
$$
56-
rmse = \sqrt{(\frac{1}{n})\sum_{i=1}^{n}(y_{i} - x_{i})^{2}}
71+
rmse = \sqrt{\left(\frac{1}{n}\right)\sum_{i=1}^{n}(y_{i} - x_{i})^{2}}
5772
$$
5873

5974

75+
## Potential Impact and Benefits
6076

61-
#### Project plan
62-
63-
**Week 8:** Creating Project base, Source control([GitHub](https://github.com/Sanjayviswa/Stackoverflow_survey_Analysis)), Project Management(MS Project).
64-
65-
- Complete Data Wrangling & basic Analysis.
66-
67-
**Week 10**: Complete baseline Model building with algorithms.
77+
- **Inform Decision-Making:** The insights derived from the analysis of the Stack Overflow Developer Survey data can inform decision-making processes in various domains, including education, recruitment, and workforce development.
78+
- **Address Gender and Ethnic Inequalities:** By analyzing the impact of education, experience, and responsibilities on gender and ethnic inequalities, this project can contribute to raising awareness and identifying strategies to address these disparities in the tech industry.
79+
- **Support Career Development:** The findings from the analysis can provide valuable insights for developers seeking to advance their careers, make informed decisions about their education and training, and enhance their job satisfaction.
80+
- **Contribute to Open Source Community:** By contributing to this project, developers have the opportunity to collaborate with others, share their expertise, and contribute to open-source initiatives aimed at improving data analysis techniques and tools.
81+
- **Empower Data-Driven Decision Making:** The project's focus on ML algorithms, data visualization, and predictive analytics empowers stakeholders to make data-driven decisions, enabling them to stay ahead in a rapidly evolving technology landscape.
6882

69-
**Week 11:** Run tests and evaluate model.
83+
## 👨‍💻 Contributing
7084

71-
**Week 12:** Prepare video presentation.
85+
- **Contributions Welcome:** We welcome contributions from the community to help enhance this project. Whether you're a seasoned developer or just starting out, there are various ways you can contribute:
86+
- **Code Contributions:** Help improve the analysis code, implement new features, or optimize existing algorithms.
87+
- **Data Wrangling:** Assist in cleaning and preprocessing the dataset to ensure accurate analysis.
88+
- **Documentation:** Enhance project documentation, including the README file, to make it more comprehensive and user-friendly.
89+
- **Bug Fixes:** Identify and fix any bugs or issues encountered during the analysis.
90+
- **Feature Requests:** Suggest new features or improvements to further enhance the project.
91+
- **How to Contribute:** To contribute, simply fork the repository, make your changes, and submit a pull request. Be sure to follow the contribution guidelines outlined in the repository.
92+
- **Contributors Recognition:** We greatly appreciate all contributions to this project and will acknowledge contributors in the README file.
93+
- **Join the Discussion:** Feel free to join the discussion on our [GitHub repository](https://github.com/Sanjayviswa/Stackoverflow_survey_Analysis) to share your ideas, ask questions, or collaborate with other contributors.
7294

95+
Crafted by @Sanjayviswa.

readme.md

Lines changed: 26 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -1,124 +1,69 @@
1-
<img src="https://stackoverflow.design/assets/img/logos/so/logo-stackoverflow.png" align="left" height="100" width="450" >
2-
<br>
3-
<br>
4-
<br>
5-
<br>
6-
<br>
7-
<br>
1+
# Stack Overflow Analysis Guidelines
82

9-
[![MIT License](https://img.shields.io/apm/l/atomic-design-ui.svg?)](https://github.com/tterb/atomic-design-ui/blob/master/LICENSEs)
10-
<img src="https://img.shields.io/github/last-commit/Sanjayviswa/Stackoverflow-Analysis">
11-
<img src="https://img.shields.io/github/languages/code-size/Sanjayviswa/Stackoverflow-Analysis">
3+
## 👨‍💻 Demo Video
124

13-
# Stackoverflow Analysis Guidelines
14-
## 👨‍💻 Demo video
15-
16-
17-
https://user-images.githubusercontent.com/30715153/168960157-e9448ea4-206c-44c0-bbd5-5e4770c0411f.mp4
18-
19-
You can start working with repo with simple changes and i have updated couple of issue .
20-
Check out the Main Code: [Stackoverflow-Analysis](https://github.com/sanjay-kv/Stackoverflow-Analysis/blob/main/Stackoverflow_Survey_Analysis.ipynb)
5+
[Watch the demo video](https://user-images.githubusercontent.com/30715153/168960157-e9448ea4-206c-44c0-bbd5-5e4770c0411f.mp4)
216

227
## 👇 Prerequisites
238

24-
Before installation, please make sure you have already installed the following tools:
9+
Before installation, please ensure you have the following tools installed:
2510

26-
27-
- [Git](https://git-scm.com/downloads) Learn Git step-by-step by following the instructions provided [here](https://recodehive.com/how-to-install-git-git-tutorial/).
28-
- [Git](https://git-scm.com/downloads) Learn Step by Step instruction here.
29-
- [recodehive](https://recodehive.com/how-to-install-git-git-tutorial/)
11+
- [Git](https://git-scm.com/downloads) - Learn Git step-by-step by following the instructions provided [here](https://recodehive.com/how-to-install-git-git-tutorial/).
3012
- [Anaconda](https://anaconda.org/anaconda)
3113
- [Jupyter Package](https://anaconda.org/anaconda/jupyter)
3214

3315
## 🛠️ Installation Steps
3416

35-
1. Fork the project
36-
Fork the sanjay-kv/Stackoverflow-Analysis/ repository
37-
Follow these instructions on [how to fork a repository](https://help.github.com/en/articles/fork-a-repo)
38-
2. Clone the project
39-
```
40-
git clone [email protected]:your-username/Stackoverflow-Analysis.git
41-
```
42-
3. Download the orginal data from the [drive link](https://drive.google.com/drive/folders/13W20DfCW2W5GEeKTYTl7R6xV5hmPS2Do?usp=sharing)
43-
4. Open Jupyter Notebook and place the file in the project folder *Make sure you selecting the correct path*
17+
1. **Fork the project**: Fork the `sanjay-kv/Stackoverflow-Analysis` repository. Follow these instructions on [how to fork a repository](https://help.github.com/en/articles/fork-a-repo).
18+
2. **Clone the project**: `git clone [email protected]:your-username/Stackoverflow-Analysis.git`
19+
3. **Download the original data** from the [drive link](https://drive.google.com/drive/folders/13W20DfCW2W5GEeKTYTl7R6xV5hmPS2Do?usp=sharing)
20+
4. **Open Jupyter Notebook** and place the file in the project folder. Make sure you're selecting the correct path.
4421

4522
## Development
4623

47-
We love your desire to give back, and want to make the process as welcoming to newcomers and experts as possible. We're working on developing more intuitive tutorials for individuals of all skill levels and expertise, so if you think the community would value from being walked through the steps you're going through please share! ❤️
48-
49-
## Finding Insights from Stackoverflow Developer Survey
50-
51-
Stack overflow is a professional community for developers, Stackoverflow conducts a survey every year the collected data from 2011 has been available for open source on the web with the latest dataset 2020 released on March 5th, 2021. If the dataset analysed professionally using modern tools, would enable us to answer real-world questions effectively. The dataset has covered 275 questions in total.
24+
We welcome contributions from all levels of experience. If you think the community would benefit from being walked through the steps you're going through, please share! ❤️
5225

53-
### Project Goal:
26+
## Finding Insights from Stack Overflow Developer Survey
5427

55-
1. To perform Analysis on 3 years Stackoverflow Dataset and get insights.
56-
2. To perform Data Analysis and answer the below questions.
57-
+ Impact of higher education on salary of the surveyed developers.
58-
+ Impact of education/experience/responsibilities on gender inequalities.
59-
+ Impact on participation rate due to different ethnicity.
60-
+ To find whether there is any difference between men and women's income.
61-
+ Impact on the increase in popularity of a language in the current year due to developer’s interest in the previous year.
28+
Stack Overflow is a professional community for developers, conducting a survey annually. Analyzing the dataset professionally using modern tools can enable us to answer real-world questions effectively. The dataset covers 275 questions in total.
6229

63-
3. To perform data visualization on
30+
### Project Goals:
6431

65-
- The most commonly used language.
66-
67-
- Distribution of surveyors based on their developer role.
68-
69-
- Factors affecting Job satisfaction.
70-
71-
- Predicting the growth of languages for upcoming years based on the survey answers.
72-
73-
###### The Insights can be used to provide information regarding IT environment, hiring employees and job seekers and build a solid résumé.
32+
1. Perform Analysis on the last 3 years' Stack Overflow Dataset to extract insights.
33+
2. Analyze the impact of higher education, experience, and responsibilities on salary and gender inequalities.
34+
3. Investigate participation rates based on ethnicity and differences in income between men and women.
35+
4. Explore the popularity of programming languages and predict their growth based on survey responses.
7436

7537
### Data Source and Background
7638

77-
https://user-images.githubusercontent.com/30715153/169042852-150e59cf-b742-40bb-bcbd-c34a330c1518.mp4
78-
79-
80-
The dataset is very diverse and came from a [Stackoverflow developer survey](https://insights.stackoverflow.com/survey/?_ga=2.208907280.304952146.1616422967-1864686930.1616422967) with 275 questions answered from 180 countries. Stackoverflow has data collected through surveys from 2011 to 2020, but for the project, the purpose is to analyze the data of the last 3 years. The people who completed the survey mostly from the US, India, and EMEA regions. The majority of the survey respondents had the background of developer/ coding experience. The data are available in the CSV format ranging from 40 to 150 MB with data of 1.5 Lakh survey participants.The dataset includes survey data gathered from 180 countries, the response ranges from Not at all important to very important/ Not at all satisfied to very satisfied.
39+
The dataset comes from the annual Stack Overflow developer survey, covering responses from developers in 180 countries. The data are available in CSV format, ranging from 40 to 150 MB, with responses from 1.5 Lakh survey participants.
8140

8241
### Data Format
8342

84-
The data is in a schema CSV file that consists of 252,199 observations and 62 variables.
85-
86-
### Projected work needs to be done for Insights.
87-
88-
###### Data Wrangling
43+
The data is in a CSV file format with 252,199 observations and 62 variables.
8944

90-
**Dealing Null Values**: As this is a developer survey and few questions left unanswered by the respondents as ‘*NA*’ or ‘*Not Applicable*’ so dealing with null values is important to get precise information. Data conversion/ manipulation is also required, as the developer responded to the survey through radio buttons rather than yes or no pattern(Univariate analysis).
45+
### Expected Work
9146

92-
###### Techniques expect to use in the project
93-
94-
Planning to use ML Algorithms like Random, may include, KNN, AUC for classification problems, training model, logistic regression,data visualization, parameter analysis, Linear Regreesion, Root Mean square.
47+
Data wrangling tasks include handling null values and converting data for analysis. Techniques such as ML algorithms and data visualization will be employed.
9548

9649
## 👨‍💻 Contributing
9750

98-
- Contributions make the open source community such an amazing place to learn, inspire, and create.
99-
- Any contributions you make are **greatly appreciated**.
100-
- Check out our contribution guidelines(yet to update) for more information.
51+
- Contributions are greatly appreciated. Check out our contribution guidelines (yet to be updated) for more information.
10152

10253
## 🛡️ License
10354

104-
LinkFree is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
55+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
10556

10657
## 💪 Thanks to all Contributors
10758

108-
Thanks a lot for spending your time helping this project grow. Thanks a lot! Keep rocking 🍻
59+
Thanks to all contributors for helping this project grow! 🍻
10960

11061
<a href="https://github.com/sanjay-kv/Stackoverflow-Analysis/graphs/contributors">
11162
<img src="https://contrib.rocks/image?repo=sanjay-kv/Stackoverflow-Analysis" />
11263
</a>
11364

11465
## 🙏 Support
11566

116-
This project needs a ⭐️ from you. Don't forget to leave a star ⭐️
117-
118-
119-
120-
121-
122-
This repo is crafted with ♥ and owned/maintained by @sanjay-kv
123-
67+
Don't forget to leave a star ⭐️ for this project!
12468

69+
Crafted with ♥ by @sanjay-kv.

0 commit comments

Comments
 (0)