Voice Coding in Blockly

Diluka · August 23, 2024, 7:10am

Hi, I’m Diluka, an undergraduate student working on my final year project. I plan to contribute to Blockly by making it more accessible to students with upper limb motor impairments through voice coding. My approach involves using a generative model to create Blockly blocks based on voice commands. I would appreciate any advice or thoughts you might have on this idea. Thank you!

Peter · August 23, 2024, 7:58am

Would it be something like this?

https://appinventor-mit-edu.ezproxy.canberra.edu.au/blogs/hal/2022/03/21/Aptly

It seems you want to contribute to Blockly itself. You could post your proposal in the Blockly Google Group https://groups.google.com/g/blockly?pli=1

Diluka · September 3, 2024, 9:21am

Dear Aptly Team,

Given that your team has successfully developed the concept of generating blocks using natural language, I am seeking your guidance on applying similar techniques to my research project, which aims to generate block code based on voice input.

Here’s the approach I’ve come up with:

User Speech Input: The process begins with the user providing a speech input.
Speech-to-Text Conversion: This speech input is converted into text using a speech-to-text API.
Text Input to Model: The resulting text is fed into a generative model.
Model Generates Block Structure: The model processes the text and generates a block structure in JSON format.
Blockly Rendering: Finally, the JSON object is rendered in the Blockly environment as a block.

I would greatly appreciate your insights on whether this approach aligns with best practices or if there are any potential pitfalls or improvements I should consider. Any guidance on optimizing this workflow or suggestions for alternative methods would be invaluable.

Thank you for your time and expertise.

Best regards,
Diluka

Peter · September 3, 2024, 10:37am

I assigned one of the devs to your topic.

ewpatton · September 13, 2024, 7:51pm

In our initial attempts at what would become Aptly, we had thought about having OpenAI Codex generate the designer/blocks content (JSON and XML, respectively in App Inventor). However, this turns out to be particularly costly when using hosted LLMs (our custom language reduces the output cost by 60%). Of course, there are now new modes for these tools to make it easier to generate structured outputs like JSON. Therefore this might be feasible. The main thing will be giving enough context in the prompt for the LLM to understand the semantics of the blocks and possibly the Blockly JSON structure.