# AgentLite Comprehensive Test Suite Plan ## Context AgentLite is a lightweight, async-first Agent component library for LLM applications. It provides: - **Agent**: Main agent class with tool calling loop and streaming support - **OpenAIProvider**: OpenAI-compatible provider implementation - **Tool System**: @tool decorator, CallableTool, CallableTool2, SimpleToolset - **MCPClient**: MCP server integration - **Message Types**: ContentPart, Message, ToolCall, etc. - **Configuration**: Pydantic-based config models ## Test Location `/home/tcmofashi/proj/general_agent/agentlite/tests/` ## Task Dependency Graph | Task | Depends On | Reason | |------|------------|--------| | 1. Test Configuration Setup | None | Foundation for all tests | | 2. Message Types Unit Tests | Task 1 | Core data structures | | 3. Tool System Unit Tests | Task 1 | Core tool abstractions | | 4. Configuration Unit Tests | Task 1 | Config validation | | 5. Provider Protocol Unit Tests | Task 1 | Provider interface | | 6. Mock Provider Implementation | Task 1 | Required for integration tests | | 7. Agent Integration Tests | Tasks 2, 3, 6 | Tests agent with mocked provider | | 8. Tool Calling Loop Tests | Tasks 3, 6 | Tests tool execution flow | | 9. Streaming Response Tests | Tasks 2, 6 | Tests streaming functionality | | 10. Conversation History Tests | Task 7 | Tests history management | | 11. Real-World Scenario: Data Quality Agent | Tasks 7, 8 | Practical use case | | 12. Real-World Scenario: Fact-Checking Agent | Tasks 7, 8 | Practical use case | | 13. Real-World Scenario: Multi-Agent Workflow | Tasks 7, 10 | Practical use case | | 14. MCP Mock Tests | Tasks 3, 6 | Tests MCP integration with mocks | | 15. Error Handling Tests | Tasks 6, 7 | Tests error scenarios | | 16. Test Coverage Analysis | All above | Verify coverage targets | ## Parallel Execution Graph ``` Wave 1 (Foundation - Start immediately): ├── Task 1: Test Configuration Setup ├── Task 2: Message Types Unit Tests ├── Task 3: Tool System Unit Tests ├── Task 4: Configuration Unit Tests ├── Task 5: Provider Protocol Unit Tests └── Task 6: Mock Provider Implementation Wave 2 (Core Integration - After Wave 1): ├── Task 7: Agent Integration Tests (depends: 1, 2, 3, 6) ├── Task 8: Tool Calling Loop Tests (depends: 3, 6) └── Task 9: Streaming Response Tests (depends: 2, 6) Wave 3 (Advanced Features - After Wave 2): ├── Task 10: Conversation History Tests (depends: 7) ├── Task 14: MCP Mock Tests (depends: 3, 6) └── Task 15: Error Handling Tests (depends: 6, 7) Wave 4 (Real-World Scenarios - After Wave 3): ├── Task 11: Data Quality Agent Scenario (depends: 7, 8) ├── Task 12: Fact-Checking Agent Scenario (depends: 7, 8) └── Task 13: Multi-Agent Workflow Scenario (depends: 7, 10) Wave 5 (Finalization - After Wave 4): └── Task 16: Test Coverage Analysis (depends: all) Critical Path: Task 1 → Task 6 → Task 7 → Task 10 → Task 13 → Task 16 Parallel Speedup: ~60% faster than sequential execution ``` ## Tasks ### Task 1: Test Configuration Setup **Description**: Create pytest configuration, conftest.py with shared fixtures, and test utilities. **Delegation Recommendation**: - Category: `quick` - Configuration setup is straightforward - Skills: [`python-programmer`] - Python testing infrastructure knowledge **Skills Evaluation**: - INCLUDED `python-programmer`: Required for pytest configuration and fixture design - OMITTED `git-master`: No git operations needed for this task - OMITTED `frontend-ui-ux`: No UI work involved **Depends On**: None **Acceptance Criteria**: - [ ] `pytest.ini` configured with asyncio mode - [ ] `conftest.py` with shared fixtures (mock_provider, sample_messages, temp_agent) - [ ] Test utilities module for common assertions - [ ] All tests can be run with `pytest tests/` **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/conftest.py` - `/home/tcmofashi/proj/general_agent/agentlite/tests/utils.py` **Commit**: YES - Message: `test: setup pytest configuration and shared fixtures` - Files: `tests/conftest.py`, `tests/utils.py` --- ### Task 2: Message Types Unit Tests **Description**: Test all message types: ContentPart, TextPart, ImageURLPart, AudioURLPart, ToolCall, ToolCallPart, Message. **Delegation Recommendation**: - Category: `quick` - Unit tests for data structures - Skills: [`python-programmer`] - Python testing patterns **Skills Evaluation**: - INCLUDED `python-programmer`: Required for writing unit tests - OMITTED `frontend-ui-ux`: No UI involved **Depends On**: Task 1 **Acceptance Criteria**: - [ ] ContentPart polymorphic validation works correctly - [ ] TextPart merge_in_place works for streaming - [ ] ToolCall merge_in_place works with ToolCallPart - [ ] Message content coercion from string works - [ ] Message.extract_text() returns correct text - [ ] Message.has_tool_calls() returns correct boolean - [ ] All edge cases covered (empty content, None values) **Test Cases**: 1. `test_content_part_registry` - Verify subclass registration 2. `test_text_part_creation` - Basic TextPart instantiation 3. `test_text_part_merge` - Streaming text merge 4. `test_image_url_part` - ImageURLPart creation and serialization 5. `test_audio_url_part` - AudioURLPart creation and serialization 6. `test_tool_call_creation` - ToolCall instantiation 7. `test_tool_call_merge` - ToolCall merging with ToolCallPart 8. `test_message_string_content` - Message with string content coercion 9. `test_message_list_content` - Message with list of ContentParts 10. `test_message_extract_text` - Text extraction from mixed content 11. `test_message_has_tool_calls` - Tool call detection 12. `test_message_serialization` - Pydantic model_dump works **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_message.py` **Commit**: YES - Message: `test: add unit tests for message types` - Files: `tests/unit/test_message.py` --- ### Task 3: Tool System Unit Tests **Description**: Test tool system: Tool, CallableTool, CallableTool2, SimpleToolset, @tool decorator, ToolResult types. **Delegation Recommendation**: - Category: `unspecified-low` - Moderate complexity with async patterns - Skills: [`python-programmer`] - Python async testing **Skills Evaluation**: - INCLUDED `python-programmer`: Required for async tool testing - OMITTED `frontend-ui-ux`: No UI involved **Depends On**: Task 1 **Acceptance Criteria**: - [ ] Tool JSON schema validation works - [ ] CallableTool validates arguments against schema - [ ] CallableTool2 uses Pydantic for validation - [ ] SimpleToolset manages tools correctly - [ ] @tool decorator creates valid tools - [ ] Tool execution handles errors gracefully - [ ] Async tool execution works correctly **Test Cases**: 1. `test_tool_schema_validation` - Invalid schema raises ValueError 2. `test_tool_ok_result` - ToolOk creation and properties 3. `test_tool_error_result` - ToolError creation and properties 4. `test_callable_tool_validation` - Argument validation against schema 5. `test_callable_tool_execution` - Successful tool execution 6. `test_callable_tool_error_handling` - Exception handling in tools 7. `test_callable_tool2_pydantic_validation` - Pydantic model validation 8. `test_callable_tool2_execution` - Type-safe tool execution 9. `test_simple_toolset_add_remove` - Tool management 10. `test_simple_toolset_handle` - Tool call handling 11. `test_simple_toolset_tool_not_found` - Missing tool error 12. `test_tool_decorator_basic` - @tool creates valid tool 13. `test_tool_decorator_with_params` - @tool with custom name/description 14. `test_tool_decorator_type_hints` - Type hint to schema conversion 15. `test_tool_concurrent_execution` - Multiple tools execute concurrently **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_tool.py` **Commit**: YES - Message: `test: add unit tests for tool system` - Files: `tests/unit/test_tool.py` --- ### Task 4: Configuration Unit Tests **Description**: Test Pydantic configuration models: ProviderConfig, ModelConfig, ToolConfig, AgentConfig. **Delegation Recommendation**: - Category: `quick` - Pydantic model validation tests - Skills: [`python-programmer`] - Pydantic testing **Skills Evaluation**: - INCLUDED `python-programmer`: Required for Pydantic validation tests **Depends On**: Task 1 **Acceptance Criteria**: - [ ] ProviderConfig validates base_url format - [ ] ProviderConfig stores api_key as SecretStr - [ ] ModelConfig validates temperature range - [ ] ModelConfig validates provider is not empty - [ ] AgentConfig validates default_model exists in models - [ ] AgentConfig validates all model providers exist - [ ] get_provider_config and get_model_config work correctly **Test Cases**: 1. `test_provider_config_validation` - Valid config creation 2. `test_provider_config_invalid_url` - Invalid base_url raises error 3. `test_provider_config_secret_str` - API key is SecretStr 4. `test_model_config_validation` - Valid model config 5. `test_model_config_temperature_range` - Temperature bounds checking 6. `test_model_config_empty_provider` - Empty provider raises error 7. `test_agent_config_validation` - Valid agent config 8. `test_agent_config_missing_default_model` - Missing default_model raises error 9. `test_agent_config_unknown_provider` - Unknown provider raises error 10. `test_agent_config_get_provider` - get_provider_config works 11. `test_agent_config_get_model` - get_model_config works **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_config.py` **Commit**: YES - Message: `test: add unit tests for configuration models` - Files: `tests/unit/test_config.py` --- ### Task 5: Provider Protocol Unit Tests **Description**: Test provider protocol and exception types: ChatProvider, StreamedMessage, TokenUsage, exception hierarchy. **Delegation Recommendation**: - Category: `quick` - Protocol and exception testing - Skills: [`python-programmer`] - Python protocol testing **Skills Evaluation**: - INCLUDED `python-programmer`: Required for protocol testing **Depends On**: Task 1 **Acceptance Criteria**: - [ ] TokenUsage calculates total correctly - [ ] Exception hierarchy is correct - [ ] APIStatusError stores status_code - [ ] ChatProvider protocol can be implemented **Test Cases**: 1. `test_token_usage_total` - Total token calculation 2. `test_token_usage_defaults` - Default cached_tokens = 0 3. `test_chat_provider_error_base` - Base exception class 4. `test_api_connection_error` - APIConnectionError creation 5. `test_api_timeout_error` - APITimeoutError creation 6. `test_api_status_error` - APIStatusError with status_code 7. `test_api_empty_response_error` - APIEmptyResponseError creation 8. `test_chat_provider_protocol` - Protocol implementation check **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/unit/test_provider.py` **Commit**: YES - Message: `test: add unit tests for provider protocol` - Files: `tests/unit/test_provider.py` --- ### Task 6: Mock Provider Implementation **Description**: Create a comprehensive mock provider for testing that simulates OpenAI API responses without real API calls. **Delegation Recommendation**: - Category: `unspecified-low` - Requires understanding of streaming and async patterns - Skills: [`python-programmer`] - Async generator implementation **Skills Evaluation**: - INCLUDED `python-programmer`: Required for mock provider implementation **Depends On**: Task 1 **Acceptance Criteria**: - [ ] MockProvider implements ChatProvider protocol - [ ] Can simulate text responses - [ ] Can simulate tool calls - [ ] Can simulate streaming responses - [ ] Can simulate errors - [ ] Configurable response sequences - [ ] Tracks calls for verification **Implementation Details**: ```python class MockProvider: """Mock provider for testing. Usage: provider = MockProvider() provider.add_response("Hello!") provider.add_tool_call("add", {"a": 1, "b": 2}, "3") agent = Agent(provider=provider) response = await agent.run("Hi") assert provider.calls == [...] """ ``` **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/mocks/provider.py` **Commit**: YES - Message: `test: add mock provider for testing` - Files: `tests/mocks/provider.py` --- ### Task 7: Agent Integration Tests **Description**: Test Agent class with mocked provider: initialization, run(), generate(), history management. **Delegation Recommendation**: - Category: `unspecified-low` - Integration testing with async - Skills: [`python-programmer`] - Async integration testing **Skills Evaluation**: - INCLUDED `python-programmer`: Required for agent integration testing **Depends On**: Tasks 1, 2, 3, 6 **Acceptance Criteria**: - [ ] Agent initializes correctly with provider - [ ] Agent.run() returns string response - [ ] Agent.run(stream=True) returns async iterator - [ ] Agent.generate() returns Message - [ ] Agent adds messages to history - [ ] Agent.clear_history() clears history - [ ] Agent respects max_iterations **Test Cases**: 1. `test_agent_initialization` - Basic agent creation 2. `test_agent_with_tools` - Agent with toolset 3. `test_agent_run_simple` - Simple non-streaming run 4. `test_agent_run_streaming` - Streaming response 5. `test_agent_generate` - Generate without tool loop 6. `test_agent_history_tracking` - Messages added to history 7. `test_agent_clear_history` - History cleared correctly 8. `test_agent_max_iterations` - Respects iteration limit 9. `test_agent_system_prompt` - System prompt used **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_agent.py` **Commit**: YES - Message: `test: add agent integration tests` - Files: `tests/integration/test_agent.py` --- ### Task 8: Tool Calling Loop Tests **Description**: Test the complete tool calling loop: agent requests tool, tool executes, result returned. **Delegation Recommendation**: - Category: `unspecified-low` - Complex async flow testing - Skills: [`python-programmer`] - Async flow testing **Skills Evaluation**: - INCLUDED `python-programmer`: Required for tool loop testing **Depends On**: Tasks 3, 6 **Acceptance Criteria**: - [ ] Agent calls tool when requested by LLM - [ ] Tool result is added to history - [ ] Agent continues conversation after tool result - [ ] Multiple tool calls in one response handled - [ ] Tool errors are handled gracefully - [ ] Tool calls are concurrent **Test Cases**: 1. `test_single_tool_call` - One tool call in conversation 2. `test_multiple_tool_calls` - Multiple tools in one response 3. `test_tool_call_chain` - Sequential tool calls 4. `test_tool_error_handling` - Tool returns error 5. `test_tool_not_found` - Unknown tool requested 6. `test_tool_concurrent_execution` - Tools execute concurrently 7. `test_tool_result_in_history` - Tool results in conversation history 8. `test_tool_call_with_arguments` - Arguments passed correctly **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_tool_loop.py` **Commit**: YES - Message: `test: add tool calling loop tests` - Files: `tests/integration/test_tool_loop.py` --- ### Task 9: Streaming Response Tests **Description**: Test streaming responses: text streaming, tool call streaming, mixed content. **Delegation Recommendation**: - Category: `unspecified-low` - Async streaming testing - Skills: [`python-programmer`] - Async generator testing **Skills Evaluation**: - INCLUDED `python-programmer`: Required for streaming testing **Depends On**: Tasks 2, 6 **Acceptance Criteria**: - [ ] Text streams in chunks - [ ] Tool calls stream correctly - [ ] Mixed content (text + tool) streams correctly - [ ] Complete response can be reconstructed - [ ] Streaming works with tool calling loop **Test Cases**: 1. `test_stream_text_only` - Simple text streaming 2. `test_stream_tool_call` - Tool call streaming 3. `test_stream_mixed_content` - Text then tool call 4. `test_stream_reconstruction` - Rebuild full response 5. `test_stream_with_tool_loop` - Streaming in tool loop 6. `test_stream_empty_response` - Empty stream handling **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_streaming.py` **Commit**: YES - Message: `test: add streaming response tests` - Files: `tests/integration/test_streaming.py` --- ### Task 10: Conversation History Tests **Description**: Test conversation history management: message ordering, role tracking, history limits. **Delegation Recommendation**: - Category: `quick` - History management testing - Skills: [`python-programmer`] - State management testing **Skills Evaluation**: - INCLUDED `python-programmer`: Required for history testing **Depends On**: Task 7 **Acceptance Criteria**: - [ ] Messages added in correct order - [ ] Roles tracked correctly (user, assistant, tool) - [ ] Tool call IDs preserved - [ ] History can be inspected - [ ] History can be cleared - [ ] History persists across multiple runs **Test Cases**: 1. `test_history_message_order` - Messages in correct order 2. `test_history_roles` - Correct role tracking 3. `test_history_tool_responses` - Tool call IDs preserved 4. `test_history_persistence` - History across multiple runs 5. `test_history_clear` - Clear history works 6. `test_history_manual_add` - Manually add messages 7. `test_history_copy` - history property returns copy **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_history.py` **Commit**: YES - Message: `test: add conversation history tests` - Files: `tests/integration/test_history.py` --- ### Task 11: Real-World Scenario - Data Quality Agent **Description**: Test a realistic data quality improvement agent that validates and cleans data. **Delegation Recommendation**: - Category: `unspecified-high` - Complex scenario testing - Skills: [`python-programmer`] - Complex test scenario design **Skills Evaluation**: - INCLUDED `python-programmer`: Required for scenario implementation **Depends On**: Tasks 7, 8 **Acceptance Criteria**: - [ ] Agent validates data format - [ ] Agent identifies data quality issues - [ ] Agent suggests corrections - [ ] Uses multiple tools (validate, clean, analyze) - [ ] Handles edge cases (empty data, invalid format) **Scenario**: ```python # Data Quality Agent validates CSV data # Tools: validate_csv, detect_anomalies, suggest_fixes # Test with sample data containing errors ``` **Test Cases**: 1. `test_data_quality_valid_data` - Clean data passes validation 2. `test_data_quality_detects_errors` - Errors detected and reported 3. `test_data_quality_suggests_fixes` - Corrections suggested 4. `test_data_quality_empty_data` - Handles empty input 5. `test_data_quality_invalid_format` - Handles format errors **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_data_quality.py` **Commit**: YES - Message: `test: add data quality agent scenario tests` - Files: `tests/scenarios/test_data_quality.py` --- ### Task 12: Real-World Scenario - Fact-Checking Agent **Description**: Test a fact-checking agent that verifies claims using tools. **Delegation Recommendation**: - Category: `unspecified-high` - Complex scenario testing - Skills: [`python-programmer`] - Complex test scenario design **Skills Evaluation**: - INCLUDED `python-programmer`: Required for scenario implementation **Depends On**: Tasks 7, 8 **Acceptance Criteria**: - [ ] Agent extracts claims from text - [ ] Agent uses search tool to verify - [ ] Agent provides verdict with evidence - [ ] Handles uncertain claims appropriately - [ ] Multiple claims in one text handled **Scenario**: ```python # Fact-Checking Agent verifies statements # Tools: search_facts, calculate_statistics, check_date # Test with verifiable and unverifiable claims ``` **Test Cases**: 1. `test_fact_check_true_claim` - Correctly identifies true claim 2. `test_fact_check_false_claim` - Correctly identifies false claim 3. `test_fact_check_multiple_claims` - Multiple claims in one text 4. `test_fact_check_uncertain` - Handles uncertain claims 5. `test_fact_check_with_evidence` - Provides supporting evidence **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_fact_checking.py` **Commit**: YES - Message: `test: add fact-checking agent scenario tests` - Files: `tests/scenarios/test_fact_checking.py` --- ### Task 13: Real-World Scenario - Multi-Agent Workflow **Description**: Test multiple agents collaborating on a complex task. **Delegation Recommendation**: - Category: `unspecified-high` - Complex multi-agent testing - Skills: [`python-programmer`] - Complex scenario design **Skills Evaluation**: - INCLUDED `python-programmer`: Required for multi-agent testing **Depends On**: Tasks 7, 10 **Acceptance Criteria**: - [ ] Multiple agents can share a provider - [ ] Agents maintain separate histories - [ ] Workflow stages execute in order - [ ] Output from one agent feeds into next - [ ] Each agent has specialized role **Scenario**: ```python # Research → Write → Edit workflow # Researcher gathers facts # Writer creates content # Editor reviews and improves ``` **Test Cases**: 1. `test_multi_agent_research_write` - Research to writer flow 2. `test_multi_agent_with_editor` - Three-agent workflow 3. `test_multi_agent_isolated_histories` - Histories don't leak 4. `test_multi_agent_shared_provider` - Provider shared correctly 5. `test_multi_agent_error_handling` - Errors don't break workflow **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/scenarios/test_multi_agent.py` **Commit**: YES - Message: `test: add multi-agent workflow scenario tests` - Files: `tests/scenarios/test_multi_agent.py` --- ### Task 14: MCP Mock Tests **Description**: Test MCP integration with mocked MCP server. **Delegation Recommendation**: - Category: `unspecified-low` - MCP protocol mocking - Skills: [`python-programmer`] - Protocol mocking **Skills Evaluation**: - INCLUDED `python-programmer`: Required for MCP mocking **Depends On**: Tasks 3, 6 **Acceptance Criteria**: - [ ] MCPClient connects to mock server - [ ] Tools load from mock server - [ ] MCP tools execute correctly - [ ] MCP errors handled gracefully - [ ] Connection cleanup works **Test Cases**: 1. `test_mcp_connect_stdio` - STDIO connection mock 2. `test_mcp_connect_sse` - SSE connection mock 3. `test_mcp_load_tools` - Load tools from mock 4. `test_mcp_tool_execution` - Execute MCP tool 5. `test_mcp_error_handling` - MCP errors handled 6. `test_mcp_context_manager` - Async context manager works **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/mocks/mcp_server.py` - `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_mcp.py` **Commit**: YES - Message: `test: add MCP integration tests with mocks` - Files: `tests/mocks/mcp_server.py`, `tests/integration/test_mcp.py` --- ### Task 15: Error Handling Tests **Description**: Test error scenarios: provider errors, tool errors, timeout, connection issues. **Delegation Recommendation**: - Category: `unspecified-low` - Error scenario testing - Skills: [`python-programmer`] - Error testing patterns **Skills Evaluation**: - INCLUDED `python-programmer`: Required for error testing **Depends On**: Tasks 6, 7 **Acceptance Criteria**: - [ ] APIConnectionError handled correctly - [ ] APITimeoutError handled correctly - [ ] APIStatusError handled correctly - [ ] Tool execution errors don't crash agent - [ ] Invalid tool arguments handled - [ ] Max iterations prevents infinite loops **Test Cases**: 1. `test_provider_connection_error` - Connection failure 2. `test_provider_timeout_error` - Request timeout 3. `test_provider_status_error` - HTTP error status 4. `test_provider_empty_response` - Empty response handling 5. `test_tool_execution_error` - Tool raises exception 6. `test_tool_invalid_arguments` - Invalid args to tool 7. `test_tool_not_found_error` - Unknown tool called 8. `test_max_iterations_reached` - Loop prevention 9. `test_json_decode_error` - Invalid JSON in tool args **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/integration/test_errors.py` **Commit**: YES - Message: `test: add error handling tests` - Files: `tests/integration/test_errors.py` --- ### Task 16: Test Coverage Analysis **Description**: Analyze test coverage and ensure targets are met. **Delegation Recommendation**: - Category: `quick` - Coverage analysis - Skills: [`python-programmer`] - Coverage tooling **Skills Evaluation**: - INCLUDED `python-programmer`: Required for coverage analysis **Depends On**: All previous tasks **Acceptance Criteria**: - [ ] Overall coverage >= 80% - [ ] Core modules (message, tool, agent) >= 90% - [ ] Provider module >= 70% - [ ] MCP module >= 60% - [ ] Coverage report generated - [ ] Missing coverage documented **Coverage Targets**: | Module | Target | Priority | |--------|--------|----------| | agentlite.message | 95% | P0 | | agentlite.tool | 95% | P0 | | agentlite.agent | 90% | P0 | | agentlite.config | 90% | P0 | | agentlite.provider | 80% | P1 | | agentlite.providers.openai | 70% | P1 | | agentlite.mcp | 60% | P2 | **Files to Create**: - `/home/tcmofashi/proj/general_agent/agentlite/tests/.coveragerc` **Commit**: YES - Message: `test: add coverage configuration and analysis` - Files: `tests/.coveragerc` --- ## Test File Structure ``` /home/tcmofashi/proj/general_agent/agentlite/tests/ ├── conftest.py # Shared fixtures and configuration ├── utils.py # Test utilities and helpers ├── .coveragerc # Coverage configuration ├── unit/ # Unit tests │ ├── __init__.py │ ├── test_message.py # Message types tests │ ├── test_tool.py # Tool system tests │ ├── test_config.py # Configuration tests │ └── test_provider.py # Provider protocol tests ├── integration/ # Integration tests │ ├── __init__.py │ ├── test_agent.py # Agent integration tests │ ├── test_tool_loop.py # Tool calling loop tests │ ├── test_streaming.py # Streaming tests │ ├── test_history.py # History management tests │ ├── test_mcp.py # MCP integration tests │ └── test_errors.py # Error handling tests ├── scenarios/ # Real-world scenario tests │ ├── __init__.py │ ├── test_data_quality.py # Data quality agent │ ├── test_fact_checking.py # Fact-checking agent │ └── test_multi_agent.py # Multi-agent workflow └── mocks/ # Mock implementations ├── __init__.py ├── provider.py # Mock OpenAI provider └── mcp_server.py # Mock MCP server ``` ## Test Fixtures (conftest.py) ### Core Fixtures ```python # Mock provider fixtures @pytest.fixture def mock_provider(): """Create a mock provider with no responses configured.""" return MockProvider() @pytest.fixture def mock_provider_with_response(): """Create a mock provider that returns a simple text response.""" provider = MockProvider() provider.add_text_response("Hello!") return provider # Sample message fixtures @pytest.fixture def sample_text_message(): """Create a sample text message.""" return Message(role="user", content="Hello!") @pytest.fixture def sample_tool_call(): """Create a sample tool call.""" return ToolCall( id="call_123", function=ToolCall.FunctionBody( name="add", arguments='{"a": 1, "b": 2}' ) ) # Tool fixtures @pytest.fixture def add_tool(): """Create a simple add tool.""" @tool() async def add(a: float, b: float) -> float: """Add two numbers.""" return a + b return add @pytest.fixture def error_tool(): """Create a tool that raises an error.""" @tool() async def error() -> str: """Always raises an error.""" raise ValueError("Test error") return error # Agent fixtures @pytest.fixture async def simple_agent(mock_provider): """Create a simple agent with mocked provider.""" return Agent(provider=mock_provider) @pytest.fixture async def agent_with_tools(mock_provider, add_tool): """Create an agent with tools.""" return Agent(provider=mock_provider, tools=[add_tool]) ``` ## Mock Implementations ### MockProvider ```python class MockProvider: """Mock provider for testing AgentLite without real API calls. This provider simulates OpenAI API responses and allows: - Configuring response sequences - Simulating tool calls - Simulating errors - Tracking all calls for verification Example: provider = MockProvider() provider.add_text_response("Hello!") provider.add_tool_call("add", {"a": 1, "b": 2}, "3") agent = Agent(provider=provider) response = await agent.run("Hi") # Verify calls assert len(provider.calls) == 1 assert provider.calls[0].system_prompt == "You are helpful." """ def __init__(self): self.responses = [] self.calls = [] self.model = "mock-model" def add_text_response(self, text: str): """Add a text response to the queue.""" self.responses.append({"type": "text", "content": text}) def add_tool_call(self, name: str, arguments: dict, result: str): """Add a tool call response to the queue.""" self.responses.append({ "type": "tool_call", "name": name, "arguments": arguments, "result": result }) def add_error(self, error: Exception): """Add an error response to the queue.""" self.responses.append({"type": "error", "error": error}) async def generate(self, system_prompt, tools, history): """Generate a mock response.""" self.calls.append(MockCall( system_prompt=system_prompt, tools=tools, history=list(history) )) if not self.responses: return MockStreamedMessage([TextPart(text="Mock response")]) response = self.responses.pop(0) if response["type"] == "error": raise response["error"] elif response["type"] == "text": return MockStreamedMessage([TextPart(text=response["content"])]) elif response["type"] == "tool_call": return MockStreamedMessage([ ToolCall( id="call_123", function=ToolCall.FunctionBody( name=response["name"], arguments=json.dumps(response["arguments"]) ) ) ]) ``` ## Test Configuration (pytest.ini) ```ini [pytest] testpaths = tests asyncio_mode = auto asyncio_default_fixture_loop_scope = function pythonpath = src addopts = -v --tb=short --strict-markers markers = unit: Unit tests integration: Integration tests scenario: Real-world scenario tests slow: Slow tests ``` ## Running Tests ```bash # Run all tests cd /home/tcmofashi/proj/general_agent/agentlite pytest tests/ # Run with coverage pytest tests/ --cov=agentlite --cov-report=html --cov-report=term # Run specific test categories pytest tests/unit/ -v pytest tests/integration/ -v pytest tests/scenarios/ -v # Run with markers pytest -m unit pytest -m integration pytest -m "not slow" # Run specific test file pytest tests/unit/test_message.py -v # Run with debugging pytest tests/ -v --pdb ``` ## Commit Strategy | After Task | Commit Message | Files | |------------|----------------|-------| | Task 1 | `test: setup pytest configuration and shared fixtures` | `tests/conftest.py`, `tests/utils.py` | | Task 2 | `test: add unit tests for message types` | `tests/unit/test_message.py` | | Task 3 | `test: add unit tests for tool system` | `tests/unit/test_tool.py` | | Task 4 | `test: add unit tests for configuration models` | `tests/unit/test_config.py` | | Task 5 | `test: add unit tests for provider protocol` | `tests/unit/test_provider.py` | | Task 6 | `test: add mock provider for testing` | `tests/mocks/provider.py` | | Task 7 | `test: add agent integration tests` | `tests/integration/test_agent.py` | | Task 8 | `test: add tool calling loop tests` | `tests/integration/test_tool_loop.py` | | Task 9 | `test: add streaming response tests` | `tests/integration/test_streaming.py` | | Task 10 | `test: add conversation history tests` | `tests/integration/test_history.py` | | Task 11 | `test: add data quality agent scenario tests` | `tests/scenarios/test_data_quality.py` | | Task 12 | `test: add fact-checking agent scenario tests` | `tests/scenarios/test_fact_checking.py` | | Task 13 | `test: add multi-agent workflow scenario tests` | `tests/scenarios/test_multi_agent.py` | | Task 14 | `test: add MCP integration tests with mocks` | `tests/mocks/mcp_server.py`, `tests/integration/test_mcp.py` | | Task 15 | `test: add error handling tests` | `tests/integration/test_errors.py` | | Task 16 | `test: add coverage configuration and analysis` | `tests/.coveragerc` | ## Success Criteria ### Verification Commands ```bash # All tests pass pytest tests/ -v # Coverage meets targets pytest tests/ --cov=agentlite --cov-report=term-missing # No import errors python -c "import agentlite; print('OK')" # Type checking passes (if mypy configured) mypy src/agentlite/ ``` ### Final Checklist - [ ] All unit tests pass - [ ] All integration tests pass - [ ] All scenario tests pass - [ ] Coverage >= 80% overall - [ ] Core modules >= 90% coverage - [ ] All mocks work correctly - [ ] Tests run without real API keys - [ ] Tests are deterministic - [ ] Tests are well-documented - [ ] Test files follow naming convention ## Notes 1. **No Real API Calls**: All tests must work without real API keys using mocks 2. **Deterministic**: Tests should produce consistent results 3. **Fast**: Unit tests should complete in < 1 second each 4. **Isolated**: Tests should not depend on each other 5. **Documented**: Complex scenarios should have docstrings explaining the use case