Skip to content

Commit 5150746

Browse files
Merge pull request #730 from MichaelViveros/master
Blog post - connecting contributors with projects
2 parents e072211 + 72527b0 commit 5150746

File tree

4 files changed

+203
-0
lines changed

4 files changed

+203
-0
lines changed
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
---
2+
layout: blog-detail
3+
post-type: blog
4+
by: Michael Viveros
5+
title: GSOC - Connecting Contributors with Projects
6+
---
7+
8+
## Introduction
9+
10+
One problem with open-source software is that it can be difficult to find a project to contribute to and actually get started contributing to it. Some open-source projects can seem very intimidating to newcomers since they have a very large code base and minimal documentation about how to get involved with the project. Other projects make it easier for prospective contributors to localize and figure out how to participate in a project, having for example:
11+
- **a contributing guide** with specific steps that potential contributors can follow to get involved
12+
- some well-documented, easy to fix, **beginner-friendly issues** which are great places to learn about the project and get started contributing
13+
- **a chatroom/forum** like gitter where anyone can come and easily ask questions
14+
15+
For Google Summer of Code 2017, my [project](https://summerofcode.withgoogle.com/projects/#5367340028919808) was to make it easier for potential contributors to both find projects to contribute to and to get started contributing to projects, all through [Scaladex](https://index.scala-lang.org/) (the Scala Library Index). I accomplished this by highlighting projects that have Contributing Info (contributing guide, beginner-friendly issues and chatroom) on the [front page](https://index.scala-lang.org/) of Scaladex so that potential contributors can easily view projects that they could contribute to and see all the information necessary for contributing in one place. I also added a [Contributing Search page](https://index.scala-lang.org/search?q=&contributingSearch=true) which can be used to query these projects.
16+
17+
[![front-page-contributing](/resources/img/blog/scaladex/front-page-contributing.png)](/resources/img/blog/scaladex/front-page-contributing.png)
18+
*Highlighted projects with Contributing Info on the front page of Scaladex*
19+
20+
Furthermore, I improved the search feature of Scaladex by adding [Github Topics](https://github.com/blog/2309-introducing-topics) to the projects stored in Scaladex so that users can search projects based on Topics. Topics are essentially categories that open-source projects belong to like android, databases, json, ...
21+
22+
[![front-page-topics](/resources/img/blog/scaladex/front-page-topics.png)](/resources/img/blog/scaladex/front-page-topics.png)
23+
*Topics for projects on the front page of Scaladex*
24+
25+
## Contributing Info
26+
### How it Works
27+
To Scaladex, Contributing Info consists of the three pieces of information mentioned above: a contributing guide, a collection of beginner-friendly issues, and a link to a chatroom/forum for the project where newcomers can come to easily ask questions. And perhaps most importantly; Scaladex can automatically obtain much of this info for your project! It's also possible for a project maintainer to manually set this information on their project's edit page in Scaladex.
28+
29+
Here's some more info about how each piece of contributing info gets set:
30+
- beginner-friendly label/issues - first the label used to identify beginner-friendly issues has to be manually set in the edit page, then Scaladex will fetch the corresponding issues
31+
- chatroom - auto-populated to a project's gitter room if it has one
32+
- contributing guide - auto-populated to a project's CONTRIBUTING.md if it has one
33+
34+
As an example, the [Scaladex project](https://github.com/scalacenter/scaladex) (for the code behind the website) uses the label "low-hanging fruit" to mark beginner-friendly issues in Github so this label can be set by the maintainer in the edit project page and all the [issues with this label](https://github.com/scalacenter/scaladex/labels/low-hanging%20fruit) will be stored for this project. It also has a [gitter room](https://gitter.im/scalacenter/scaladex) for chatting and a [contributing guide](https://github.com/scalacenter/scaladex/blob/master/CONTRIBUTING.md) which will be auto-populated for the project when all the projects are indexed.
35+
36+
Scaladex uses Github's [GraphQL API](https://developer.github.com/v4/) to get a project's beginner-friendly issues, see the [Github Topics](#github-topics) section below for more info about Github's GraphQL API. To get a project's contributing guide, Scaladex uses Github's [REST API](https://developer.github.com/v3/) to send a GET request to the [Community Profile API](https://developer.github.com/v3/repos/community/) which will return links to a project's contributing guide, code of conduct and license. Lastly, to get a project's chatroom, Scaladex generates a URL for a project's gitter room based on the project's repository name and the organization it belongs to (Ex. <https://gitter.im/scalacenter/scaladex>) and checks if that URL exists.
37+
38+
You can also find Contributing Info on the front page of Scaladex. Now, Scaladex highlights a random subset of projects which have Contributing Info on the front page of Scaladex. It picks a random selection of projects each time the page is loaded to give the same amount of exposure to all projects with Contributing Info. We hope to highlight and better guide potential contributors to projects and issues that are of interest to them!
39+
40+
The Contributing Search page is similar to the normal search page in Scaladex where you can search projects based on keywords but it only shows projects that have Contributing Info and instead of just showing a project's description in the search results, it shows Contributing Info for each result. Additionally, the Contributing Search page does filtering on a project's beginner-friendly issues. For example, if you enter a search term related to documentation like "docs", the [search results](https://index.scala-lang.org/search?q=docs&contributingSearch=true) will contain issues related to documentation for each project.
41+
42+
[![contributing-search](/resources/img/blog/scaladex/contributing-search.png)](/resources/img/blog/scaladex/contributing-search.png)
43+
*The Contributing Search page in Scaladex which shows Contributing Info like beginner-friendly issues for each result*
44+
45+
### Code
46+
The code for Contributing Info was committed in 2 pull requests, 1 for the [back-end](https://github.com/scalacenter/scaladex/pull/448) and 1 for the [front-end](https://github.com/scalacenter/scaladex/pull/467).
47+
48+
### Challenge
49+
One interesting challenge I ran into was filtering a project's issues based on a search term. For example, say a user is searching for all issues related to documentation so they enter "docs" as a search term in the Contributing Search page. A project called akka-http has some beginner-friendly issues, one of which is related to documentation with the title "#22874 - Add examples to Sink.actorRefWithAck and Source.queue docs". Since this is the only issue for akka-http that has "docs" in it's title, it should be the only issue that shows up for akka-http in the search results.
50+
51+
All the projects in Scaladex are stored in an [elasticsearch index](https://www.elastic.co/blog/what-is-an-elasticsearch-index) which is like a database in a relational database. Each project stored in elasticsearch has the following fields:
52+
```
53+
name: Text
54+
description: Text
55+
isDeprecated: Boolean
56+
github: Object
57+
readme: Text
58+
commits: Long
59+
beginnerIssues: Nested
60+
number: Long
61+
title: Text
62+
...
63+
```
64+
Each project has a `github` field of type `Object` containing Github info like a project's readme and it's number of commits. The `github` field has a `beginnerIssues` field which is a list of a project's beginner-friendly issues. The `beginnerIssues` field is of type [Nested](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) which is a special version of the `Object` type used for lists of `Object`s. Each issue in `beginnerIssues` is of type `Object` and it has a `number` field and a `title` field.
65+
66+
When Scaladex generates a search query to match the input search term ("docs" from the example above) to an elasticsearch query, all you have to do to match the search term against a project's beginner-friendly issues is add a [Nested Query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html) against the `github.beginnerIssues` field and specify you want to match the search term against the issue's `title` field. So this is the Nested Query I added to [DataRepository.scala](https://github.com/scalacenter/scaladex/pull/467/commits/5bcecb58e91c52590e4460189d0415db4d4d2e1f#diff-c5de88d14364dfaadbdecdc462d6c7d1R254) which generates the elasticsearch query:
67+
```
68+
nestedQuery("github.beginnerIssues",
69+
termQuery("github.beginnerIssues.title", searchTerm))
70+
```
71+
72+
This sort of worked. It would return the correct projects that have issues matching the search term, but instead of returning only the issues related to the search term, it would return all the issues. So in the example with the "docs" search term, all of akka-http's issues would be returned, not just the one related to documentation.
73+
74+
After looking through the elasticsearch documentation for awhile, I came across [Inner Hits](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html) which can be used with Nested Queries to select out the nested inner objects that matched the query. So inner hits would return only the beginner-friendly issues that matched the search term. So I updated the code that creates the Nested Query to also extract the inner hits that get returned:
75+
```
76+
nestedQuery("github.beginnerIssues",
77+
termQuery("github.beginnerIssues.title", searchTerm))
78+
.inner(innerHits("issues").size(7))
79+
```
80+
81+
And then I added the filtered beginner-friendly issues from inner hits to the project that gets created from the results of the elasticsearch query. I did this by updating the code in [package.scala](https://github.com/scalacenter/scaladex/pull/467/commits/5bcecb58e91c52590e4460189d0415db4d4d2e1f#diff-0aa128fca8ddf4b576663970f7fc4940R39) that reads in each result of the elasticsearch query (`hit`) and converts it to a Scala `Project` object which is used by the server elsewhere.
82+
```
83+
implicit object ProjectAs extends HitReader[Project] {
84+
override def read(hit: Hit): Either[Throwable, Project] = {
85+
val project = nread[Project](hit.sourceAsString).copy(id = Some(hit.id))
86+
87+
val projectWithFilteredIssues = hit.asInstanceOf[RichSearchHit].innerHits
88+
.get("issues")
89+
.collect {
90+
case searchHits if searchHits.totalHits > 0 => {
91+
p.copy(
92+
github = p.github.map { github =>
93+
github.copy(
94+
beginnerIssues = searchHits
95+
.getHits()
96+
.map { hit =>
97+
nread[GithubIssue](hit.getSourceAsString)
98+
}
99+
.toList
100+
)
101+
}
102+
)
103+
}
104+
}
105+
.getOrElse(p)
106+
107+
tryEither(projectWithFilteredIssues))
108+
}
109+
}
110+
```
111+
112+
## Github Topics
113+
### How it Works
114+
To categorize projects in Scaladex, the old process was for project maintainers to manually set keywords for their project in Scaladex. Users could then search for projects based on keywords.
115+
116+
Github recently added ["topics"](https://github.com/blog/2309-introducing-topics) to projects stored in Github which are labels that can be set for a project corresponding to categories that a project belongs to. Topics are essentially the same as keywords in Scaladex but maintainers could set them for their project in Github instead of having to do so in Scaladex.
117+
118+
Topics are part of Github’s new [GraphQL API](https://developer.github.com/v4/) which is meant to eventually replace their old [REST API](https://developer.github.com/v3/). [GraphQL](http://graphql.org/) is a "A query language for your API". It is both a query language and a graph-structured schema which stores data with nodes as objects and edges as relationships between objects. It was developed by Facebook and is different from a traditional REST API by having all API requests go to one route and having a query defined in the request body to specify precisely what data you want.
119+
120+
With Github's REST API, you have to make multiple requests to different routes to get project info about multiple projects. And when you make a request, all the data related to that request would be returned. For example, if you wanted to get the most recent 3 issues created for 5 different projects, you would make 5 requests to 5 different routes for each project. Each request would return all the project’s issues. With the GraphQL API, all requests are made to the same route and in the body of the request you input a GraphQL query which specifies exactly what information you want and for which projects. So for the example above of getting the most recent 3 issues created for 5 projects, you would make 1 request to 1 route containing a query to get only the 3 most recent issues for the 5 projects and only those 3 issues for each of the projects would be returned. This results in less requests to Github’s API and less data returned in each response.
121+
122+
So I replaced keywords with topics for projects in Scaladex and used Github’s new GraphQL API to fetch the topics. These topics are fetched for all projects when the server is indexed. A lot more projects have topics than keywords (which had to manually be set by maintainers in Scaladex), so this greatly improved the ability to search for projects based on categories in Scaladex since there are a lot more projects with categories.
123+
124+
Here's the code I added to [GithubDownload.scala](https://github.com/scalacenter/scaladex/commit/a771d7a70fdb7aaa0003abf48aaa87a622d89f03#diff-e03c541cf1bd7ec0322a9a6571160bebR339) which contains the GraphQL query that is put in the POST body of the request sent to Github's GraphQL API to fetch topics for a project. You can see the graph-structure of GraphQL in the query. The query first gets a `repository` node and then accesses it's topics through the `repositoryTopics` edge/connection. Then it selects the names of the topics belonging to that repository.
125+
```
126+
private def topicQuery(repo: GithubRepo): JsObject = {
127+
128+
val query =
129+
"""
130+
query($owner:String!, $name:String!) {
131+
repository(owner: $owner, name: $name) {
132+
repositoryTopics(first: 50) {
133+
nodes {
134+
topic {
135+
name
136+
}
137+
}
138+
}
139+
}
140+
}
141+
"""
142+
Json.obj(
143+
"query" -> query,
144+
"variables" -> s"""{ "owner": "${repo.organization}", "name": "${repo.repository}" }"""
145+
)
146+
}
147+
```
148+
If you run that query for the [akka](https://github.com/akka/akka) project, this is what gets returned from the Github API:
149+
```
150+
{
151+
"data": {
152+
"repository": {
153+
"repositoryTopics": {
154+
"nodes": [
155+
{
156+
"topic": {
157+
"name": "reactive"
158+
}
159+
},
160+
{
161+
"topic": {
162+
"name": "distributed-systems"
163+
}
164+
},
165+
{
166+
"topic": {
167+
"name": "concurrency"
168+
}
169+
},
170+
{
171+
"topic": {
172+
"name": "high-performance"
173+
}
174+
},
175+
{
176+
"topic": {
177+
"name": "akka"
178+
}
179+
},
180+
{
181+
"topic": {
182+
"name": "actor-model"
183+
}
184+
},
185+
{
186+
"topic": {
187+
"name": "distributed-actors"
188+
}
189+
}
190+
]
191+
}
192+
}
193+
}
194+
}
195+
```
196+
197+
### Code
198+
The code for Github Topics was committed in [one pull request](https://github.com/scalacenter/scaladex/pull/421).
199+
200+
## Closing Remarks
201+
Huge thanks to my mentor Heather Miller who was very approachable and always took the time to discuss the best way to implement this project.
202+
203+
Also thanks to Guillame Massé for being a super dev teammate and to Julien Richard-Foy for providing great feedback on my pull requests.
Loading
Loading
Loading

0 commit comments

Comments
 (0)