This project is a university project focused on creating a voice-controlled robotic system. The system is comprised of the following key components:
- AI Server (
ai_server/
): Hosts various Artificial Intelligence models, primarily for speech recognition (converting voice commands to text) and potentially other AI tasks like vision or advanced command understanding. - Jetson (
jetson/
): Acts as the main control unit. It likely captures audio, uses the AI server to interpret voice commands, and then translates these commands into actions for the robotic hardware. - Arduino (
arduino/
): Manages the low-level control of the robot's hardware, such as motors for movement and other actuators, based on instructions received from the Jetson.
The project is structured as follows:
ai_server/
: Contains various AI models and scripts to run them. The primary models and their purposes are:- ASR Models: Used for Automatic Speech Recognition to convert spoken commands into text.
- Canary-1B ASR (
canary-1b/
) - Distil-Whisper ASR (
distil-whisper/
) - Whisper Large v3 ASR (
whisper-large-v3/
)
- Canary-1B ASR (
- Vision Model:
- Phi-3 Vision (
Phi-3-vision-128k-instruct/
): Capable of understanding and interpreting visual information.
- Phi-3 Vision (
- Sentence Transformers:
- BAAI/bge-m3 (
sentence-transformers/
): Used to create numerical representations (embeddings) of text, enabling semantic understanding of commands.
- BAAI/bge-m3 (
- Running the Models:
- Most models are run using Docker via
run.sh
scripts in their respective directories (e.g.,cd ai_server/MODEL_DIRECTORY && ./run.sh
). - The
sentence-transformers
model is run usingpython app.py
in its directory and utilizes Gradio for its interface.
- Most models are run using Docker via
- ASR Models: Used for Automatic Speech Recognition to convert spoken commands into text.
arduino/
: This directory contains the project for the microcontroller that directly controls the robot's hardware.- Purpose: Manages the physical operation of the robot, including:
- Movement: Controls DC motors for forward, backward, left, and right motions.
- Elevator Mechanism: Controls a stepper motor for upward and downward movements.
- Development: The project is developed using PlatformIO.
- Main Code: The core logic for hardware control is located in
src/main.cpp
. - Command Interface: The Arduino receives commands from the Jetson via serial communication. Commands are strings with the format
MOVEMENT DURATION
. Examples:forward five second(s)
backward two second(s)
left one second(s)
right three second(s)
upward four second(s)
downward one second(s)
stop zero second(s)
(duration is often ignored for stop, as it's immediate)
- Purpose: Manages the physical operation of the robot, including:
jetson/
: This directory houses the software for the Jetson device, which acts as the central processing unit or "brain" of the robotic system.- Role: Orchestrates the overall operation of the robot by processing inputs and sending commands to the hardware controllers.
- Workflow:
- Captures audio input (likely via a microphone).
- Communicates with the
ai_server
to perform Automatic Speech Recognition (ASR) on the audio, converting it to text. - Processes the recognized text command. The
jetson/similarity/app.py
script suggests it may use sentence embeddings (potentially from anai_server
model) to compare the command against known commands or understand variations in phrasing. - Sends appropriate serial commands to the
arduino
to actuate the robot's motors and elevator.
- Subdirectories:
jetson/client/
: Contains client application code. (Note: ItsREADME.md
is currently a template and requires further details specific to this client).jetson/similarity/
: Contains code related to command processing and similarity matching, as evidenced byapp.py
.
- Dockerfile: Contains a
Dockerfile
(jetson/Dockerfile
), likely for containerizing the Jetson application environment, ensuring consistency and ease of deployment.
- Prerequisites: Docker must be installed and running. Ensure your system supports GPU passthrough for Docker if using GPU-accelerated models.
- Models:
- Most models are run via Docker. The
run.sh
script in each model's directory (e.g.,ai_server/canary-1b/run.sh
) should attempt to pull the required Docker image from a registry (e.g., Hugging Face Spaces). - If a
run.sh
script fails to pull an image, you may need to manually pull it usingdocker pull <image_name>
. The specific image names can be found within therun.sh
scripts. - The
sentence-transformers
model (ai_server/sentence-transformers/
) is run directly with Python. It requires Python and Gradio to be installed (pip install gradio
). Dependencies for the model itself (e.g.,BAAI/bge-m3
) are usually downloaded automatically by thegradio.load()
command on first run.
- Most models are run via Docker. The
- Prerequisites: PlatformIO IDE or PlatformIO Core must be installed.
- Building and Uploading:
- Open the
arduino/
directory in PlatformIO IDE or navigate to it in your terminal. - Build the project using the PlatformIO build command (e.g.,
pio run
). - Upload the compiled firmware to the Arduino board using the PlatformIO upload command (e.g.,
pio run --target upload
). Ensure the Arduino is connected to your computer.
- Open the
- Prerequisites:
- Python 3 installed.
- Consider setting up a Python virtual environment.
- Dependencies:
- Install Python dependencies, such as
gradio_client
. Check individual scripts or subdirectories likejetson/client/
for specificrequirements.txt
files. - Ensure any Jetson-specific SDKs or libraries (e.g., NVIDIA JetPack components) required for hardware interaction or accelerated computing are installed.
- Install Python dependencies, such as
- Hardware Connection: Ensure the Arduino is connected to the Jetson (e.g., via USB) for serial communication.
Follow these steps to operate the voice-controlled robotic system:
-
Start AI Services (
ai_server
):- Navigate to the directory of the desired AI model(s) within
ai_server/
. For instance, to use an ASR model:cd ai_server/canary-1b
(ordistil-whisper
,whisper-large-v3
). - Execute the
run.sh
script:./run.sh
. This will typically start a Docker container serving the model. Note the port it's running on (e.g., 7860, found in therun.sh
or script output). - If using the
sentence-transformers
model for command similarity, navigate toai_server/sentence-transformers/
and runpython app.py
. Note its port (e.g., 7861). - Ensure the IP address and port of these AI services are correctly configured in the Jetson application(s) that will call them. (The example
jetson/similarity/app.py
uses127.0.0.1:7860
, implying it might run on the same machine or requires configuration ifai_server
is remote).
- Navigate to the directory of the desired AI model(s) within
-
Prepare Arduino (
arduino
):- Ensure the Arduino board is flashed with the latest firmware from the
arduino/
directory (as per Setup instructions). - Connect the Arduino to the Jetson device (typically via USB). Note the serial port on the Jetson that the Arduino connects to (e.g.,
/dev/ttyUSB0
or/dev/ttyACM0
). This may need to be configured in the Jetson application.
- Ensure the Arduino board is flashed with the latest firmware from the
-
Run Jetson Application (
jetson
):- Navigate to the main Jetson application directory (e.g., this could be within
jetson/client/src
or a specific project directory). - Execute the main Python script (e.g.,
python main_control_script.py
- replacemain_control_script.py
with the actual name of the script). - Ensure the Jetson application is configured with:
- The correct IP address(es) and port(s) for the AI services running on
ai_server
. - The correct serial port for communicating with the Arduino.
- The correct IP address(es) and port(s) for the AI services running on
- Navigate to the main Jetson application directory (e.g., this could be within
-
Operate the System:
- Once all components are running and connected, you should be able to issue voice commands.
- Speak clearly into the microphone connected to the Jetson system.
- The Jetson application will process the voice command using the AI server, determine the intended action (possibly using similarity matching), and send the corresponding command to the Arduino to control the robot.
We welcome contributions to this project! If you'd like to contribute, please follow these general guidelines:
- Fork the Repository: Start by forking the project to your own GitHub account.
- Create a Branch: Create a new branch in your fork for your feature or bug fix. Use a descriptive name (e.g.,
feature/new-voice-command
orfix/arduino-motor-bug
). - Make Your Changes: Implement your changes, additions, or fixes in your branch.
- Ensure your code is well-commented.
- If adding new features, consider if documentation or examples need updating.
- If applicable, add tests for your changes.
- Test Your Changes: Make sure your changes work as expected and do not break existing functionality.
- Commit Your Changes: Commit your work with clear and descriptive commit messages.
- Push to Your Fork: Push your changes to your branch on your fork.
- Submit a Pull Request: Open a pull request from your branch to the main project's
main
(ormaster
) branch. Provide a clear description of your changes in the pull request.
If you're planning a larger contribution, it's a good idea to open an issue first to discuss your ideas.
This project is licensed under the terms of the LICENSE file.