You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Learn.md
+18-19Lines changed: 18 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,24 @@ Stack overflow is a professional community for developers, Stackoverflow conduct
45
45
46
46
###### The Insights can be used to provide information regarding IT environment, hiring employees and job seekers and build a solid résumé.
47
47
48
+
### Data Source and Background
49
+
50
+
The dataset is very diverse and came from a [Stackoverflow developer survey](https://insights.stackoverflow.com/survey/?_ga=2.208907280.304952146.1616422967-1864686930.1616422967) with 275 questions answered from 180 countries. Stackoverflow has data collected through surveys from 2011 to 2020, but for the project, the purpose is to analyze the data of the last 3 years. The people who completed the survey mostly from the US, India, and EMEA regions. The majority of the survey respondents had the background of developer/ coding experience. The data are available in the CSV format ranging from 40 to 150 MB with data of 1.5 Lakh survey participants.The dataset includes survey data gathered from 180 countries, the response ranges from Not at all important to very important/ Not at all satisfied to very satisfied.
51
+
52
+
### Data Format
53
+
54
+
The data is in a schema CSV file that consists of 252,199 observations and 62 variables.
55
+
56
+
### Projected work needs to be done for Insights.
57
+
58
+
###### Data Wrangling
59
+
60
+
**Dealing Null Values**: As this is a developer survey and few questions left unanswered by the respondents as ‘*NA*’ or ‘*Not Applicable*’ so dealing with null values is important to get precise information. Data conversion/ manipulation is also required, as the developer responded to the survey through radio buttons rather than yes or no pattern(Univariate analysis).
61
+
62
+
###### Techniques expect to use in the project
63
+
64
+
Planning to use ML Algorithms like Random, may include, KNN, AUC for classification problems, training model, logistic regression,data visualization, parameter analysis, Linear Regreesion, Root Mean square.
65
+
48
66
## 👨💻 Contributing
49
67
50
68
- Contributions make the open source community such an amazing place to learn, inspire, and create.
@@ -69,23 +87,4 @@ This project needs a ⭐️ from you. Don't forget to leave a star ⭐️
69
87
70
88
71
89
72
-
### Data Source and Background
73
-
74
-
The dataset is very diverse and came from a [Stackoverflow developer survey](https://insights.stackoverflow.com/survey/?_ga=2.208907280.304952146.1616422967-1864686930.1616422967) with 275 questions answered from 180 countries. Stackoverflow has data collected through surveys from 2011 to 2020, but for the project, the purpose is to analyze the data of the last 3 years. The people who completed the survey mostly from the US, India, and EMEA regions. The majority of the survey respondents had the background of developer/ coding experience. The data are available in the CSV format ranging from 40 to 150 MB with data of 1.5 Lakh survey participants.The dataset includes survey data gathered from 180 countries, the response ranges from Not at all important to very important/ Not at all satisfied to very satisfied.
75
-
76
-
### Data Format
77
-
78
-
The data is in a schema CSV file that consists of 252,199 observations and 62 variables.
79
-
80
-
### Projected work needs to be done for Insights.
81
-
82
-
###### Data Wrangling
83
-
84
-
**Dealing Null Values**: As this is a developer survey and few questions left unanswered by the respondents as ‘*NA*’ or ‘*Not Applicable*’ so dealing with null values is important to get precise information. Data conversion/ manipulation is also required, as the developer responded to the survey through radio buttons rather than yes or no pattern(Univariate analysis).
85
-
86
-
###### Techniques expect to use in the project
87
-
88
-
Planning to use ML Algorithms like Random, may include, KNN, AUC for classification problems, training model, logistic regression,data visualization, parameter analysis, Linear Regreesion, Root Mean square.
0 commit comments