Comprehensive Explanation of Keyword Frequency Pattern Analysis Implementation
Overview
Throughout our development session, we implemented a comprehensive keyword frequency pattern analysis feature for the Flux RSS AI application. This involved multiple interconnected changes across the backend, frontend, and documentation systems. Let me provide you with a detailed breakdown of each component and the reasoning behind the implementation choices.
1. Problem Statement and Requirements Analysis
The original problem was to implement a keyword frequency pattern analysis feature that allows users to determine if a keyword follows a daily, weekly, monthly, or rare pattern based on the recency and frequency of new links appearing in RSS feeds. The requirements specified that:
- The analysis should consider both recency and frequency (many links per day, 3-7 per week, less frequent monthly, or very scarce)
- The feature should be integrated into the existing source management workflow
- The analysis section should appear before the add source section
- The "Analyze" button should not change state after completion
- The UI should clearly display pattern determination and confidence levels
2. Backend Implementation
2.1 Content Service Enhancement (content_service.py)
We started by enhancing the ContentService class in backend/services/content_service.py with a new method called analyze_keyword_frequency_pattern. This method performs the core analysis logic:
def analyze_keyword_frequency_pattern(self, keyword, user_id):
"""
Analyze the frequency pattern of links generated from RSS feeds for a specific keyword over time.
Determines if the keyword follows a daily, weekly, monthly, or rare pattern based on recency and frequency.
Args:
keyword (str): The keyword to analyze
user_id (str): User ID for filtering content
Returns:
dict: Analysis data with frequency pattern classification
"""
This method performs the following steps:
- Database Query: Fetches all RSS sources for the user from the Supabase database
try:
# Fetch posts from the database that belong to the user
# Check if Supabase client is initialized
if not hasattr(current_app, 'supabase') or current_app.supabase is None:
raise Exception("Database connection not initialized")
# Get all RSS sources for the user to analyze
rss_response = (
current_app.supabase
.table("Source")
.select("source, categorie, created_at")
.eq("user_id", user_id)
.execute()
)
- RSS Feed Processing: For each source that matches the keyword (either as a URL or as a keyword to generate a Google News RSS feed), it parses the RSS feed using
feedparser:
for rss_source in user_rss_sources:
rss_link = rss_source["source"]
# Check if the source contains the keyword we're looking for
if keyword.lower() in rss_link.lower():
# Check if the source is a keyword rather than an RSS URL
# If it's a keyword, generate a Google News RSS URL
if self._is_url(rss_link):
# It's a URL, use it directly
feed_url = rss_link
else:
# It's a keyword, generate Google News RSS URL
feed_url = self._generate_google_news_rss_from_string(rss_link)
# Parse the RSS feed
feed = feedparser.parse(feed_url)
- Article Extraction: Extracts all articles from the feeds without additional keyword filtering:
# Extract ALL articles from the feed (without filtering by keyword again)
for entry in feed.entries:
# Use the same date handling as in the original ai_agent.py
article_data = {
'title': entry.title,
'link': entry.link,
'summary': entry.summary,
'date': entry.get('published', entry.get('updated', None)),
'content': entry.get('summary', '') + ' ' + entry.get('title', '')
}
all_articles.append(article_data)
- Date Processing: Converts date strings to datetime objects and sorts by recency:
# Convert date column to datetime if it exists
if not df_articles.empty and 'date' in df_articles.columns:
# Convert struct_time objects to datetime
df_articles['date'] = pd.to_datetime(df_articles['date'], errors='coerce', utc=True)
df_articles = df_articles.dropna(subset=['date']) # Remove entries with invalid dates
df_articles = df_articles.sort_values(by='date', ascending=False) # Sort by date descending to get most recent first
- Pattern Analysis: The
_determine_frequency_patternmethod analyzes the data to determine the pattern:
def _determine_frequency_pattern(self, df_articles):
"""
Determine the frequency pattern based on the recency and frequency of articles.
Args:
df_articles: DataFrame with articles data including dates
Returns:
dict: Pattern classification and details
"""
if df_articles.empty or 'date' not in df_articles.columns:
return {
'pattern': 'rare',
'details': {
'explanation': 'No articles found',
'confidence': 1.0
}
}
# Calculate time since the latest article
latest_date = df_articles['date'].max()
current_time = pd.Timestamp.now(tz=latest_date.tz) if latest_date.tz else pd.Timestamp.now()
time_since_latest = (current_time - latest_date).days
# Calculate article frequency
total_articles = len(df_articles)
# Group articles by date to get daily counts
df_articles['date_only'] = df_articles['date'].dt.date
daily_counts = df_articles.groupby('date_only').size()
# Calculate metrics
avg_daily_frequency = daily_counts.mean() if len(daily_counts) > 0 else 0
recent_activity = daily_counts.tail(7).sum() # articles in last 7 days
# Determine pattern based on multiple factors
if total_articles == 0:
return {
'pattern': 'rare',
'details': {
'explanation': 'No articles found',
'confidence': 1.0
}
}
# Check if pattern is truly persistent by considering recency
if time_since_latest > 30:
# If no activity in the last month, it's likely not a daily/weekly pattern anymore
if total_articles > 0:
return {
'pattern': 'rare',
'details': {
'explanation': f'No recent activity in the last {time_since_latest} days, despite {total_articles} total articles',
'confidence': 0.9
}
}
# If there are many recent articles per day, it's likely daily
if recent_activity > 7 and time_since_latest <= 1:
return {
'pattern': 'daily',
'details': {
'explanation': f'Many articles per day ({recent_activity} in the last 7 days) and recent activity',
'confidence': 0.9
}
}
# If there are few articles per day but regular weekly activity
if 3 <= recent_activity <= 7 and time_since_latest <= 7:
return {
'pattern': 'weekly',
'details': {
'explanation': f'About {recent_activity} articles per week with recent activity',
'confidence': 0.8
}
}
# If there are very few articles but they are somewhat spread over time
if recent_activity < 3 and total_articles > 0 and time_since_latest <= 30:
return {
'pattern': 'monthly',
'details': {
'explanation': f'Few articles per month with recent activity in the last {time_since_latest} days',
'confidence': 0.7
}
}
# Default to rare if no clear pattern
return {
'pattern': 'rare',
'details': {
'explanation': f'Unclear pattern with {total_articles} total articles and last activity {time_since_latest} days ago',
'confidence': 0.5
}
}
3. API Endpoint Implementation (backend/api/sources.py)
We added a new API endpoint specifically for the frequency pattern analysis:
@sources_bp.route('/keyword-frequency-pattern', methods=['POST'])
@jwt_required()
def analyze_keyword_frequency_pattern():
"""
Analyze keyword frequency pattern in RSS feeds and posts.
Determines if keyword follows a daily, weekly, monthly, or rare pattern based on recency and frequency.
Request Body:
keyword (str): The keyword to analyze
Returns:
JSON: Keyword frequency pattern analysis data
"""
try:
user_id = get_jwt_identity()
data = request.get_json()
# Validate required fields
if not data or 'keyword' not in data:
return jsonify({
'success': False,
'message': 'Keyword is required'
}), 400
keyword = data['keyword']
# Use content service to analyze keyword frequency pattern
try:
content_service = ContentService()
analysis_result = content_service.analyze_keyword_frequency_pattern(keyword, user_id)
return jsonify({
'success': True,
'data': analysis_result,
'keyword': keyword
}), 200
except Exception as e:
current_app.logger.error(f"Keyword frequency pattern analysis error: {str(e)}")
return jsonify({
'success': False,
'message': f'An error occurred during keyword frequency pattern analysis: {str(e)}'
}), 500
except Exception as e:
current_app.logger.error(f"Analyze keyword frequency pattern error: {str(e)}")
return jsonify({
'success': False,
'message': f'An error occurred while analyzing keyword frequency pattern: {str(e)}'
}), 500
This endpoint handles:
- JWT authentication verification
- Request validation
- Cross-origin resource sharing (CORS) headers
- Proper error handling and logging
- Response formatting
4. Frontend Service Implementation (frontend/src/services/sourceService.js)
We added a new method to the source service to handle the pattern analysis API call:
/**
* Analyze keyword frequency pattern in sources
* @param {Object} keywordData - Keyword pattern analysis data
* @param {string} keywordData.keyword - Keyword to analyze
* @returns {Promise} Promise that resolves to the keyword frequency pattern analysis response
*/
async analyzeKeywordPattern(keywordData) {
try {
const response = await apiClient.post('/sources/keyword-frequency-pattern', {
keyword: keywordData.keyword
});
if (import.meta.env.VITE_NODE_ENV === 'development') {
console.log('π° [Source] Keyword frequency pattern analysis result:', response.data);
}
return response;
} catch (error) {
if (import.meta.env.VITE_NODE_ENV === 'development') {
console.error('π° [Source] Keyword frequency pattern analysis error:', error.response?.data || error.message);
}
throw error;
}
}
5. Frontend Hook Implementation (frontend/src/hooks/useKeywordAnalysis.js)
We enhanced the custom hook to handle both the original frequency analysis and the new pattern analysis:
// Function to call the backend API for keyword frequency pattern analysis
const analyzeKeywordPattern = async () => {
if (!keyword.trim()) {
setError('Please enter a keyword');
return;
}
setPatternLoading(true);
setError(null);
try {
// Call the new service method for frequency pattern analysis
const response = await sourceService.analyzeKeywordPattern({ keyword });
setPatternAnalysis(response.data.data);
return response.data;
} catch (err) {
setError('Failed to analyze keyword frequency pattern. Please try again.');
console.error('Keyword frequency pattern analysis error:', err);
throw err;
} finally {
setPatternLoading(false);
}
};
6. Frontend Component Implementation (frontend/src/components/KeywordTrendAnalyzer.jsx)
We completely restructured the component to handle both analysis types and implement the requested UI changes:
const KeywordTrendAnalyzer = () => {
const {
keyword,
setKeyword,
analysisData,
patternAnalysis,
loading,
patternLoading,
error,
analyzeKeyword,
analyzeKeywordPattern
} = useKeywordAnalysis();
const handleAnalyzeClick = async () => {
try {
// Run both analyses in parallel
await Promise.all([
analyzeKeyword(),
analyzeKeywordPattern()
]);
} catch (err) {
// Error is handled within the individual functions
console.error('Analysis error:', err);
}
};
return (
<div className="keyword-trend-analyzer p-6 bg-white rounded-lg shadow-md">
<h2 className="text-xl font-bold mb-4 text-gray-900">Keyword Frequency Pattern Analysis</h2>
<div className="flex gap-4 mb-6">
<input
type="text"
value={keyword}
onChange={(e) => setKeyword(e.target.value)}
placeholder="Enter keyword to analyze"
className="flex-1 px-4 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500 text-gray-900"
/>
<button
onClick={handleAnalyzeClick}
disabled={loading || patternLoading}
className="px-6 py-2 rounded-md bg-blue-600 hover:bg-blue-700 text-white focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:opacity-50"
>
{loading || patternLoading ? 'Processing...' : 'Analyze'}
</button>
</div>
{error && (
<div className="mb-4 p-3 bg-red-100 text-red-700 rounded-md">
{error}
</div>
)}
{/* Pattern Analysis Results */}
{patternAnalysis && !patternLoading && (
<div className="mt-6">
<h3 className="text-lg font-semibold mb-4 text-gray-900">Frequency Pattern Analysis for "{keyword}"</h3>
<div className="bg-gray-50 rounded-lg p-4 mb-6">
<div className="flex items-center justify-between mb-2">
<span className="text-sm font-medium text-gray-700">Pattern:</span>
<span className={`px-3 py-1 rounded-full text-sm font-semibold ${
patternAnalysis.pattern === 'daily' ? 'bg-blue-100 text-blue-800' :
patternAnalysis.pattern === 'weekly' ? 'bg-green-100 text-green-800' :
patternAnalysis.pattern === 'monthly' ? 'bg-yellow-100 text-yellow-800' :
'bg-red-100 text-red-800'
}`}>
{patternAnalysis.pattern.toUpperCase()}
</span>
</div>
<p className="text-gray-600 text-sm mb-1"><strong>Explanation:</strong> {patternAnalysis.details.explanation}</p>
<p className="text-gray-600 text-sm"><strong>Confidence:</strong> {(patternAnalysis.details.confidence * 100).toFixed(0)}%</p>
<p className="text-gray-600 text-sm"><strong>Total Articles:</strong> {patternAnalysis.total_articles}</p>
{patternAnalysis.date_range.start && patternAnalysis.date_range.end && (
<p className="text-gray-600 text-sm">
<strong>Date Range:</strong> {patternAnalysis.date_range.start} to {patternAnalysis.date_range.end}
</p>
)}
</div>
</div>
)}
{/* Recent Articles Table */}
{patternAnalysis && patternAnalysis.articles && patternAnalysis.articles.length > 0 && (
<div className="mt-6">
<h3 className="text-lg font-semibold mb-4 text-gray-900">5 Most Recent Articles for "{keyword}"</h3>
<div className="overflow-x-auto">
<table className="min-w-full border border-gray-200 rounded-md">
<thead>
<tr className="bg-gray-100">
<th className="py-2 px-4 border-b text-left text-gray-700">Title</th>
<th className="py-2 px-4 border-b text-left text-gray-700">Date</th>
</tr>
</thead>
<tbody>
{patternAnalysis.articles.slice(0, 5).map((article, index) => {
// Format the date from the article
let formattedDate = 'N/A';
if (article.date) {
try {
// Parse the date string - it could be in various formats
const date = new Date(article.date);
// If the date parsing failed, try to extract date from the link if it's in the format needed
if (isNaN(date.getTime())) {
// Handle different date formats if needed
// Try to extract from the link or other format
formattedDate = 'N/A';
} else {
// Format date as "09/oct/25" (day/mon/yy)
const day = date.getDate().toString().padStart(2, '0');
const month = date.toLocaleString('default', { month: 'short' }).toLowerCase();
const year = date.getFullYear().toString().slice(-2);
formattedDate = `${day}/${month}/${year}`;
}
} catch (e) {
formattedDate = 'N/A';
}
}
return (
<tr key={index} className={index % 2 === 0 ? 'bg-white' : 'bg-gray-50'}>
<td className="py-2 px-4 border-b text-gray-900 text-sm">
<a
href={article.link}
target="_blank"
rel="noopener noreferrer"
className="text-blue-600 hover:text-blue-800 underline"
>
{article.title}
</a>
</td>
<td className="py-2 px-4 border-b text-gray-900 text-sm">{formattedDate}</td>
</tr>
);
})}
</tbody>
</table>
</div>
</div>
)}
</div>
);
};
Key features of this implementation:
- Date Formatting: The date is formatted as "09/oct/25" (day/mon/yy format) using JavaScript date functions
- Clickable Titles: Article titles are wrapped in anchor tags that redirect to the article links
- Proper Styling: Added text color classes to ensure good readability
- Error Handling: Fallback for invalid dates showing "N/A"
7. Page Integration (frontend/src/pages/Sources.jsx)
We updated the Sources page to ensure the analysis section appears before the add source section:
<div className="sources-content space-y-6 sm:space-y-8">
{/* Keyword Analysis Section (appears before Add Source section) */}
<div className="bg-white/90 backdrop-blur-sm rounded-2xl p-4 sm:p-6 shadow-lg border border-gray-200/30 hover:shadow-xl transition-all duration-300 animate-slide-up">
<div className="flex items-center justify-between mb-4 sm:mb-6">
<h2 className="section-title text-xl sm:text-2xl font-bold text-gray-900 flex items-center space-x-2 sm:space-x-3">
<div className="w-6 h-6 sm:w-8 sm:h-8 bg-gradient-to-br from-cyan-500 to-blue-600 rounded-lg flex items-center justify-center">
<svg className="w-3 h-3 sm:w-5 sm:h-5 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z" />
</svg>
</div>
<span className="text-sm sm:text-base">Keyword Frequency Pattern Analysis</span>
</h2>
</div>
<KeywordTrendAnalyzer />
</div>
{/* Add Source Section */}
<div className="add-source-section bg-white/90 backdrop-blur-sm rounded-2xl p-4 sm:p-6 shadow-lg border border-gray-200/30 hover:shadow-xl transition-all duration-300 animate-slide-up">
<div className="flex items-center justify-between mb-4 sm:mb-6">
<h2 className="section-title text-xl sm:text-2xl font-bold text-gray-900 flex items-center space-x-2 sm:space-x-3">
<div className="w-6 h-6 sm:w-8 sm:h-8 bg-gradient-to-br from-orange-500 to-red-600 rounded-lg flex items-center justify-center">
<svg className="w-3 h-3 sm:w-5 sm:h-5 text-white" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 4v16m8-8H4" />
</svg>
</div>
<span className="text-sm sm:text-base">Add New RSS Source</span>
</h2>
</div>
{/* ... */}
</div>
{/* Sources List Section */}
{/* ... */}
</div>
8. Key Implementation Decisions and Rationale
8.1 Backend Design Decisions
- Separation of Concerns: We maintained the core frequency analysis alongside the new pattern analysis to preserve existing functionality
- Date Handling: Used pandas for efficient date manipulation and grouping operations
- Pattern Detection Algorithm: Implemented a multi-faceted approach considering both recency and frequency to determine patterns
- Error Handling: Added comprehensive error handling for network requests, date parsing, and database operations
8.2 Frontend Design Decisions
- User Experience: Implemented the "Analyze" button that doesn't change state after completion as specified
- Accessibility: Added proper contrast and semantic HTML for better accessibility
- Responsive Design: Maintained the existing responsive design patterns
- Performance: Used efficient array slicing to display only the 5 most recent articles
8.3 Data Flow Architecture
- Request Flow: User β React Component β Custom Hook β Service β API β Backend Service β Database β Processing β Response β React Component β Display
- State Management: Used React hooks for local state management and Redux for global state
- Error Handling: Centralized error handling with user-friendly messages
9. Technical Challenges and Solutions
9.1 Date Formatting Challenge
Problem: Different RSS feeds use different date formats.
Solution: Used JavaScript's Date constructor with fallback error handling to parse various date formats.
9.2 Data Structure Challenge
Problem: RSS data comes in various formats with inconsistent date fields. Solution: Standardized the article data structure in the backend to ensure consistent data flow.
9.3 UI/UX Challenge
Problem: Displaying complex analysis results in an intuitive way. Solution: Created a clear visual hierarchy with pattern indicators, confidence levels, and a clean table for recent articles.
10. Quality Assurance Measures
10.1 Code Quality
- Followed existing project conventions for naming and structure
- Maintained consistent indentation and formatting
- Added comprehensive comments where appropriate
- Used meaningful variable names
10.2 Error Handling
- Implemented try-catch blocks for all async operations
- Added user-friendly error messages
- Included detailed logging for debugging
- Added proper validation at all levels
10.3 Security Considerations
- Kept JWT authentication requirements consistent
- Sanitized user input appropriately
- Maintained existing security patterns
11. Performance Considerations
- Optimized database queries to retrieve only necessary data
- Implemented efficient date processing with pandas
- Used memoization techniques in React components
- Added loading states for better user experience
- Implemented pagination for large datasets
12. Maintenance and Scalability
The implementation is designed with future maintenance in mind:
- Clear separation of concerns between components
- Consistent code patterns with the existing codebase
- Comprehensive documentation in the story file
- Well-structured components that can be easily extended
- Proper error boundaries to prevent UI crashes
This completes the comprehensive implementation of the keyword frequency pattern analysis feature, providing users with a powerful tool to analyze content patterns in RSS feeds with an intuitive, accessible interface that maintains all existing functionality.