Building a Historical AI Avatar: Martin Luther in Godot
06.01.2025
The Vision
As someone passionate about both history and education, I’ve always been frustrated by how static and one-dimensional historical education can be. Reading about Martin Luther’s theological ideas is one thing, but what if you could actually engage with him in conversation?
The idea behind this project was to bridge the gap between historical texts and modern interactive learning. By combining LLM technology with character animation, we created an AI avatar that doesn’t just recite facts, but engages in meaningful dialogue about theology, reform, and faith – all while maintaining historical accuracy.
Also to build all in one day. We like to challenge ourselves.
Technical Stack
- Game Engine: Godot 4.3
- AI Integration: OpenAI API with gpt-4o-mini model
- Animation: Custom 2D sprite-based system with dynamic mouth shapes and expression blending
- Primary Sources: Curated collection of Luther’s writings and contemporary documents
- Configuration: Secure key management with environment and file fallbacks
- Text Reveal: Optimized character-by-character animation system with natural timing
Development Journey
Phase 1: Foundation and AI Integration
One of our earliest challenges was finding the right “voice” for Luther. Initial attempts either produced responses that were too modern or too stiff and academic. The breakthrough came when we restructured the prompt to emphasize Luther’s role as a teacher and reformer.
A key technical decision was implementing a layered architecture for the LLM integration:
# Base LLM client that handles common functionality
class_name LLMClient extends Node
# Specific implementation for OpenAI
class_name OpenAIClient extends LLMClient
The OpenAI client implementation includes robust configuration handling:
class_name OpenAIClient extends LLMClient
const API_CONFIG_FILE = "res://config/openai_config.cfg"
const API_KEY_ENV_VAR = "OPENAI_API_KEY"
var _cached_key: String = ""
var _config: ConfigFile
func _init() -> void:
_config = ConfigFile.new()
_load_config()
func _load_config() -> void:
# Try environment variable first
var env_key = OS.get_environment(API_KEY_ENV_VAR)
if env_key and env_key.length() > 0:
_cached_key = env_key
print("Using API key from environment")
return
# Fall back to config file
var err = _config.load(API_CONFIG_FILE)
if err == OK:
var config_key = _config.get_value("api", "key", "")
if config_key and config_key.length() > 0:
_cached_key = config_key
print("Using API key from config file")
return
print("No API key found in environment or config")
func _save_api_key(key: String) -> void:
_config.set_value("api", "key", key)
_config.save(API_CONFIG_FILE)
_cached_key = key
var api_key: String:
get:
if _cached_key.is_empty():
_load_config()
return _cached_key
We also created a template configuration file to guide setup:
# openai_config.template.cfg
[api]
key="your-api-key-here"
model="gpt-4-mini"
temperature=0.7
max_tokens=150
[system]
debug_logging=false
cache_responses=true
This abstraction proved valuable as it allowed us to:
- Easily switch between different LLM providers
- Test with different models (starting with GPT-4-mini)
- Maintain consistent interface for the rest of the application
- Securely handle API keys with multiple fallback options
We quickly built a simple system with the OpenAI API, and added this layer of abstraction to easily switch between different LLM providers and models. That way the project can get smarter and cheaper as models improve.
Some prompt engineering gets us to a reasonable level of Martin Luther-ness. After experimenting some, we found that prompting for a teacher roleplaying as Martin Luther works best. That way the system does not get easily confused by modern topics, and will steer the conversation back to Luther’s theological views.
Here’s an example of the system prompt we use:
You are roleplaying as Martin Luther, the 16th-century Protestant Reformer. Stay in character while being aware that this is a historical portrayal. Your responses should reflect Luther’s theological views, personality, and historical context.
Core Character Traits:
– Strong convictions about faith, scripture, and salvation through grace
– Direct and passionate communication style
– Scholarly but able to speak to common people
– Known for both serious theological discourse and witty remarks
The animation commands are integrated into the prompt, allowing the LLM to control character expressions and movements naturally:
Animation Commands:
[POSE=X] for base pose changes
[EMOTION=X] for facial expressions
[GESTURE=X] for specific movements
[MOUTH=X] for mouth shapes during speech
Available commands:
POSE: neutral, thoughtful, teaching, passionate, relaxed
EMOTION: neutral, happy, concerned, stern, passionate
GESTURE: nod, shake_head, point_up, hand_wave, book_reference
MOUTH: A, F, I, L, M, O, S, U, idle
Phase 2: Animation System
The animation system has evolved significantly through several iterations:
- Initial version: Simple mapping of characters to mouth shapes
- Enhanced version: Phoneme-based matching with timing controls
- Current version: Context-aware system with dynamic transitions
The animation manager now handles all aspects of character animation:
class_name AnimationManager extends Node
# Sprite references
@onready var mouth_sprite: Sprite2D = $MouthSprite
@onready var expression_sprite: Sprite2D = $ExpressionSprite
@onready var pose_sprite: Sprite2D = $PoseSprite
# Animation state
var current_mouth_shape := "idle"
var current_expression := "neutral"
var current_pose := "neutral"
var is_transitioning := false
# Mouth shape textures
var mouth_shapes = {
"A": load("res://assets/A.jpg"),
"O": load("res://assets/O.jpg"),
"I": load("res://assets/I.jpg"),
"S": load("res://assets/S.jpg"),
"U": load("res://assets/U.jpg"),
"M": load("res://assets/M.jpg"),
"F": load("res://assets/F.jpg"),
"L": load("res://assets/L.jpg"),
"idle": load("res://assets/Relaxed.jpg")
}
# Phoneme mapping for improved accuracy
const PHONEME_MAP = {
"ah": "A",
"aa": "A",
"ae": "A",
"oh": "O",
"ow": "O",
"oo": "U",
"ee": "I",
"ih": "I",
"eh": "I",
"ss": "S",
"sh": "S",
"ch": "S",
"mm": "M",
"bb": "M",
"pp": "M",
"ff": "F",
"vh": "F",
"th": "F",
"ll": "L",
"tt": "L",
"dd": "L"
}
# Transition timing
const TRANSITION_TIME = 0.1
const MIN_SHAPE_TIME = 0.05
func _ready() -> void:
# Initialize with default states
set_mouth_shape("idle")
set_expression("neutral")
set_pose("neutral")
func set_mouth_shape(shape: String) -> void:
if not mouth_shapes.has(shape):
print("Unknown mouth shape: ", shape)
shape = "idle"
if shape == current_mouth_shape:
return
if is_transitioning:
await get_tree().create_timer(MIN_SHAPE_TIME).timeout
is_transitioning = true
current_mouth_shape = shape
# Create tween for smooth transition
var tween = create_tween()
tween.tween_property(mouth_sprite, "texture",
mouth_shapes[shape], TRANSITION_TIME)
await tween.finished
is_transitioning = false
func get_mouth_shape(char: String, context: String) -> String:
# Check for phoneme matches first
for phoneme in PHONEME_MAP:
if context.contains(phoneme):
return PHONEME_MAP[phoneme]
# Fall back to single character mapping
var shape := "idle"
match char.to_lower():
'a', 'h':
shape = "A" # Open mouth for 'ah' sounds
'o', 'w':
shape = "O" # Round mouth for 'oh' sounds
'i', 'e':
shape = "I" # Narrow mouth for 'ee' sounds
's', 'z':
shape = "S" # Teeth showing for sibilants
'u':
shape = "U" # Small round mouth for 'oo' sounds
'm', 'b', 'p':
shape = "M" # Closed lips for labials
'f', 'v':
shape = "F" # Lower lip under teeth
'l', 't', 'd':
shape = "L" # Tongue position for dentals
return shape
func set_expression(expression: String) -> void:
if expression == current_expression:
return
current_expression = expression
# Load and set expression sprite
# Implementation depends on available expressions
func set_pose(pose: String) -> void:
if pose == current_pose:
return
current_pose = pose
# Load and set pose sprite
# Implementation depends on available poses
The text reveal system has been optimized for smooth animation and natural pacing:
class_name TextRevealManager extends Node
# Timing constants
const CHAR_REVEAL_TIME = 0.025 # Seconds per character
const WORD_PAUSE_TIME = 0.075 # Pause at word boundaries
const PUNCTUATION_PAUSE_TIME = 0.15 # Pause at punctuation
const PARAGRAPH_PAUSE_TIME = 0.3 # Pause at paragraphs
# State tracking
var _is_revealing := false
var _current_text := ""
var _revealed_text := ""
var _current_pos := 0
# Signals
signal text_revealed(text: String)
signal char_revealed(char: String, pos: int)
signal reveal_completed
func reveal_text(text: String) -> void:
if _is_revealing:
await cancel_reveal()
_is_revealing = true
_current_text = text
_revealed_text = ""
_current_pos = 0
while _current_pos < _current_text.length():
if not _is_revealing:
break
var char = _current_text[_current_pos]
_revealed_text += char
# Determine pause duration based on character context
var delay = _get_delay_for_char(char)
# Update mouth shape based on current and surrounding characters
var context = _get_char_context(_current_pos)
var shape = animation_manager.get_mouth_shape(char, context)
animation_manager.set_mouth_shape(shape)
# Emit signals for UI updates
char_revealed.emit(char, _current_pos)
text_revealed.emit(_revealed_text)
await get_tree().create_timer(delay).timeout
_current_pos += 1
# Reset to idle mouth shape
animation_manager.set_mouth_shape("idle")
_is_revealing = false
reveal_completed.emit()
func _get_delay_for_char(char: String) -> float:
match char:
' ':
return WORD_PAUSE_TIME
'.', '!', '?', ':':
return PUNCTUATION_PAUSE_TIME
'n':
return PARAGRAPH_PAUSE_TIME
_:
return CHAR_REVEAL_TIME
func _get_char_context(pos: int, window: int = 2) -> String:
var start = max(0, pos - window)
var end = min(_current_text.length(), pos + window + 1)
return _current_text.substr(start, end - start)
func cancel_reveal() -> void:
if _is_revealing:
_is_revealing = false
# Immediately show full text
_revealed_text = _current_text
text_revealed.emit(_revealed_text)
reveal_completed.emit()
Recent improvements to the animation system include:
- Phoneme-based mouth shape mapping for more accurate lip sync
- Smooth transitions between mouth shapes using tweens
- Context-aware shape selection considering surrounding characters
- Proper timing for natural speech rhythm
- Cancellable animations for better user interaction
- Debug logging for animation state tracking
- Type safety improvements throughout the system
The system now provides a more natural and fluid animation experience, with mouth shapes that better match the spoken text and smoother transitions between states.
Phase 3: Educational Features
We implemented a dynamic objective tracking system that:
- Monitors conversation topics in real-time
- Maps discussions to predefined learning objectives
- Provides contextual prompts to guide the conversation
The learning objectives are structured around key theological concepts:
const objectives = {
"justification": {
"title": "Justification by Faith",
"description": "Understanding Luther's core doctrine of salvation through faith alone",
"keywords": ["faith alone", "sola fide", "justification", "salvation", "grace"],
"sources": ["bondage_of_the_will", "christian_liberty", "freedom_of_a_christian"],
"prompts": [
"What did you mean by 'faith alone'?",
"How does one receive salvation?",
"Why did you disagree with the Catholic view of works?"
]
},
"scripture": {
"title": "Authority of Scripture",
"description": "Exploring Luther's emphasis on biblical authority",
"keywords": ["bible", "scripture", "sola scriptura", "word of god"],
"sources": ["german_bible", "lectures_on_romans", "biblical_commentaries"],
"prompts": [
"Why translate the Bible into German?",
"What role should scripture play in the church?",
"How should Christians interpret the Bible?"
]
},
"church_reform": {
"title": "Church Reform",
"description": "Understanding Luther's critique of church practices",
"keywords": ["indulgences", "95 theses", "reform", "corruption"],
"sources": ["95_theses", "sermon_on_indulgences_and_grace", "address_to_the_diet_of_worms"],
"prompts": [
"What were your main criticisms of the church?",
"Why did you write the 95 Theses?",
"How did you envision church reform?"
]
},
"priesthood": {
"title": "Priesthood of All Believers",
"description": "Learning about Luther's view of Christian vocation",
"keywords": ["priesthood", "believers", "vocation", "calling"],
"sources": ["to_the_christian_nobility", "freedom_of_a_christian"],
"prompts": [
"What is the priesthood of all believers?",
"How should Christians serve in their vocations?",
"Why reject the distinction between clergy and laity?"
]
}
}
The resource manager handles primary source integration:
class_name ResourceManager extends Node
var sources = {
"bondage_of_the_will": {
"title": "On the Bondage of the Will",
"year": "1525",
"url": "https://www.gutenberg.org/files/1471/1471-h/1471-h.htm",
"description": "Luther's response to Erasmus on free will and salvation",
"keywords": ["free will", "predestination", "salvation"]
},
"95_theses": {
"title": "The Ninety-Five Theses",
"year": "1517",
"url": "https://www.gutenberg.org/files/274/274-h/274-h.htm",
"description": "Luther's critique of indulgences that sparked the Reformation",
"keywords": ["indulgences", "repentance", "reform"]
},
# Additional sources...
}
func find_relevant_resources(text: String) -> Array[Dictionary]:
var relevant = []
var words: Array[String] = []
for part in text.split(" "):
words.append(part.to_lower())
for source_id in sources:
var source = sources[source_id]
var relevance = _calculate_relevance(words, source)
if relevance > 0:
relevant.append({
"id": source_id,
"relevance": relevance,
"data": source
})
relevant.sort_custom(func(a, b): return a.relevance > b.relevance)
return relevant.slice(0, 2) # Return top 3 most relevant sources
func _calculate_relevance(words: Array[String], source: Dictionary) -> float:
var score = 0.0
for keyword in source.keywords:
var variations: Array[String] = []
for variation in keyword.split(" "):
variations.append(variation.to_lower())
var found = 0
for word in words:
if variations.has(word):
found += 1
if found == variations.size():
score += 1.0
return score
The system automatically suggests relevant primary sources and tracks progress without interrupting the natural flow of conversation. The tabbed interface allows users to easily switch between the conversation and educational resources:
# UI setup for tabbed interface
@onready var tab_container = $TabContainer
@onready var chat_tab = $TabContainer/Chat
@onready var sources_tab = $TabContainer/Sources
@onready var progress_tab = $TabContainer/Progress
func _ready() -> void:
# Initialize tabs
tab_container.set_tab_title(0, "Chat")
tab_container.set_tab_title(1, "Sources")
tab_container.set_tab_title(2, "Progress")
# Set up resource display
sources_tab.clear()
progress_tab.clear()
# Connect signals
tab_container.tab_changed.connect(_on_tab_changed)
func update_resources(response: String) -> void:
var relevant = resource_manager.find_relevant_resources(response)
sources_tab.clear()
for resource in relevant:
var source = resource.data
var text = "[b]%s[/b] (%s)n%sn[url]%s[/url]nn" % [
source.title,
source.year,
source.description,
source.url
]
sources_tab.append_text(text)
func update_progress(response: String) -> void:
var completed = []
for objective_id in objectives:
var objective = objectives[objective_id]
if _check_completion(response, objective):
completed.append(objective.title)
progress_tab.clear()
for title in completed:
progress_tab.append_text("✓ " + title + "n")
Technical Challenges Overcome
API Key Security
- Implemented multiple fallback options for API key storage
- Environment variable (OPENAI_API_KEY)
- Configuration file (openai_config.cfg)
- Added key caching to reduce file system access
- Created template system for safe version control
- Implemented proper error handling for missing keys
- Added debug logging for key loading process
- Implemented multiple fallback options for API key storage
Response Streaming
- Developed custom streaming solution for real-time text reveal
- Implemented character-by-character animation triggering
- Added variable timing based on character context
- Optimized performance for smooth animation
- Implemented proper pause timing for punctuation
- Added cancellable reveal for better user experience
- Created signal system for UI synchronization
Resource Management
- Created relevance scoring algorithm for primary sources
- Implemented URL validation and security checks
- Developed caching system for frequently accessed resources
- Added type safety for string arrays and dictionaries
- Improved keyword matching with term variations
Animation System
- Resolved mouth shape loading issues by switching from preload to load
- Fixed type mismatches in animation manager
- Optimized sprite transitions for smoother animation
- Added debug logging for animation state tracking
- Improved timing system for more natural speech rhythm
- Implemented context-aware mouth shape selection
Future Development
Voice Synthesis Integration
- Investigating integration with ElevenLabs for period-appropriate voice
- Planning lip-sync improvements for voice output
Enhanced Animation System
- Implementing gesture blending for smoother transitions
- Adding emotional state machine for more natural expressions
- Improving mouth shape transitions with interpolation
- Adding more varied and dynamic expressions
- Implementing advanced timing algorithms for speech patterns
- Adding support for emphasized syllables and stress patterns
Educational Features
- Developing a curriculum builder for teachers
- Adding support for custom primary sources
- Creating a progress tracking dashboard
- Implementing a more sophisticated relevance scoring system
- Adding support for multiple languages and translations
Security and Configuration
- Implementing encrypted storage for sensitive data
- Adding user-specific configuration profiles
- Improving error handling and recovery
- Enhancing logging and debugging tools
Lessons Learned
AI Integration
- Careful API key management is critical – we implemented multiple fallback options (environment variables, config files) and proper validation
- Prompt engineering requires extensive testing – we went through many iterations to find the right balance of historical accuracy, educational value, and natural conversation flow
- Configuration management needs to balance security with ease of use
Animation Systems
- Real-time animation is deceptively simple to implement at a basic level, but achieving natural, fluid motion requires significant expertise and iteration
- Performance wasn’t a bottleneck since we kept the system lightweight, focusing on essential animations rather than complex blending
- Type safety in Godot 4 requires careful attention, especially when working with arrays and string operations
- Timing is crucial for natural-feeling animation – small adjustments can make a big difference
- Context-aware animation decisions lead to more realistic results
Educational Design
- While we built a foundation for learning features, we didn’t have time to thoroughly test them with real users
- The dynamic resource system proved more valuable than initially expected, helping users dive deeper into topics naturally
- Balancing historical accuracy with accessibility requires careful consideration
- Primary sources need clear context and explanation to be useful
Conclusion
This project demonstrates how game development technology can be combined with AI to create engaging educational experiences. The Martin Luther avatar shows that historical education doesn’t have to be static – it can be interactive, personalized, and still maintain historical accuracy.
The combination of LLM technology with real-time animation and educational features creates a unique learning environment that engages users while maintaining historical authenticity. While there’s still room for improvement, particularly in animation sophistication and educational feature testing, the foundation is solid for future development.
Jonas Heinke