|
1 |
| -# Docs Scraper |
| 1 | +# Docs Scraper <!-- omit in TOC --> |
2 | 2 |
|
3 |
| -A scraper for MeiliSearch's documentation, indexing the content into a MeiliSearch instance. |
| 3 | +A scraper for your documentation website, indexing the content into a MeiliSearch instance. |
| 4 | + |
| 5 | +- [Installation and Usage](#installation-and-usage) |
| 6 | + - [From source code](#from-source-code) |
| 7 | + - [With Docker](#with-docker) |
| 8 | + - [In a GitHub Action](#in-a-github-action) |
| 9 | + - [About the API Key](#about-the-api-key) |
| 10 | +- [Configuration file](#configuration-file) |
| 11 | +- [Related projects](#related-projects) |
| 12 | +- [Credits](#credits) |
4 | 13 |
|
5 |
| -_Will be generalized soon for all documentations_ |
6 | 14 |
|
7 | 15 | ## Installation and Usage
|
8 | 16 |
|
9 | 17 | This project supports Python 3.6+.
|
10 | 18 |
|
| 19 | +### From source code |
| 20 | + |
11 | 21 | Set both environment variables `MEILISEARCH_HOST_URL` and `MEILISEARCH_API_KEY`.
|
12 | 22 |
|
13 | 23 | Then, run:
|
14 | 24 | ```bash
|
15 | 25 | $ pipenv install
|
16 |
| -$ pipenv shell |
17 |
| -$ ./docs_scraper run config/config.json |
| 26 | +$ pipenv run ./docs_scraper run <path-to-your-config-file> |
| 27 | +``` |
| 28 | + |
| 29 | +### With Docker |
| 30 | + |
| 31 | +```bash |
| 32 | +$ docker run -t --rm \ |
| 33 | + -e MEILISEARCH_HOST_URL=<your-meilisearch-host-url> \ |
| 34 | + -e MEILISEARCH_API_KEY=<your-meilisearch-api-key> \ |
| 35 | + -v <absolute-path-to-your-config-file>:/docs-scraper/config.json \ |
| 36 | + getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json |
18 | 37 | ```
|
19 | 38 |
|
20 |
| -_WIP_ |
| 39 | +### In a GitHub Action |
| 40 | + |
| 41 | +To run after your deployment job: |
| 42 | + |
| 43 | +```yml |
| 44 | +run-scraper: |
| 45 | + needs: <your-deployment-job> |
| 46 | + runs-on: ubuntu-18.04 |
| 47 | + steps: |
| 48 | + - uses: actions/checkout@master |
| 49 | + - name: Run scraper |
| 50 | + env: |
| 51 | + HOST_URL: ${{ secrets.MEILISEARCH_HOST_URL }} |
| 52 | + API_KEY: ${{ secrets.MEILISEARCH_API_KEY }} |
| 53 | + CONFIG_FILE_PATH: <path-to-your-config-file> |
| 54 | + run: | |
| 55 | + docker run -t --rm \ |
| 56 | + -e MEILISEARCH_HOST_URL=$HOST_URL \ |
| 57 | + -e MEILISEARCH_API_KEY=$API_KEY \ |
| 58 | + -v $CONFIG_FILE_PATH:/docs-scraper/config.json \ |
| 59 | + getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json |
| 60 | +``` |
| 61 | +
|
| 62 | +Here is the [GitHub Action file](https://github.com/meilisearch/documentation/blob/master/.github/workflows/gh-pages-scraping.yml) we use in production for the MeiliSearch documentation. |
| 63 | +
|
| 64 | +### About the API Key |
| 65 | +
|
| 66 | +The API key you must provide as environment variable should have the permissions to add documents into your MeiliSearch instance. |
| 67 | +
|
| 68 | +Thus, you need to provide the private key or the master key. |
| 69 | +
|
| 70 | +_More about [MeiliSearch authentication](https://docs.meilisearch.com/guides/advanced_guides/authentication.html)._ |
| 71 | +
|
| 72 | +## Configuration file |
| 73 | +
|
| 74 | +A generic configuration file: |
| 75 | +
|
| 76 | +```json |
| 77 | +{ |
| 78 | + "index_uid": "docs", |
| 79 | + "start_urls": ["https://www.example.com/doc/"], |
| 80 | + "sitemap_urls": ["https://www.example.com/sitemap.xml"], |
| 81 | + "stop_urls": [], |
| 82 | + "selectors": { |
| 83 | + "lvl0": { |
| 84 | + "selector": ".docs-lvl0", |
| 85 | + "global": true, |
| 86 | + "default_value": "Documentation" |
| 87 | + }, |
| 88 | + "lvl1": { |
| 89 | + "selector": ".docs-lvl1", |
| 90 | + "global": true, |
| 91 | + "default_value": "Chapter" |
| 92 | + }, |
| 93 | + "lvl2": ".docs-content .docs-lvl2", |
| 94 | + "lvl3": ".docs-content .docs-lvl3", |
| 95 | + "lvl4": ".docs-content .docs-lvl4", |
| 96 | + "lvl5": ".docs-content .docs-lvl5", |
| 97 | + "lvl6": ".docs-content .docs-lvl6", |
| 98 | + "text": ".docs-content p, .docs-content li" |
| 99 | + } |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +The scraper will focus on the highlighted information depending on your selectors. |
| 104 | + |
| 105 | +Here is the [configuration file](https://github.com/meilisearch/documentation/blob/master/.vuepress/scraper/config.json) we use for the MeiliSearch documentation. |
21 | 106 |
|
22 | 107 | ## Related projects
|
23 | 108 |
|
24 |
| -For the front part, check out the [docs-searchbar.js repository](https://github.com/meilisearch/docs-searchbar.js), wich provides a front-end search bar. |
| 109 | +After having crawled your documentation, you might need a search bar to improve your user experience! |
| 110 | + |
| 111 | +For the front part, check out the [docs-searchbar.js repository](https://github.com/meilisearch/docs-searchbar.js), wich provides a front-end search bar adapted for documentation. |
25 | 112 |
|
26 | 113 | ## Credits
|
27 | 114 |
|
|
0 commit comments