fix: Improved duplicate prevention in cast iron scanner

- Better link normalization and checking
- Skip items already in seen_links with logging
- Clean up state file to last 500 items
- Always mark items as seen (deal or not)
- Added logging for skipped duplicates

Also: eBay scraping temporarily blocked/changed - investigating
This commit is contained in:
2026-04-10 16:16:35 -04:00
parent 30703bfd45
commit 4bd829ca8c
9 changed files with 573 additions and 1002 deletions

View File

@@ -2886,3 +2886,19 @@ No new leads found
[Fri Apr 10 11:00:02 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 11:18:09 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 11:18:09 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 12:00:02 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 12:00:02 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 12:19:24 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 12:19:24 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 13:00:02 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 13:00:02 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 14:00:02 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 14:00:02 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 14:15:36 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 14:15:36 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 15:00:02 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 15:00:02 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 15:24:12 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 15:24:12 EDT 2026] Response size: 7791 bytes
[Fri Apr 10 16:00:01 EDT 2026] ✓ hoaledgeriq.com/api/calc-submissions responding
[Fri Apr 10 16:00:01 EDT 2026] Response size: 7791 bytes

View File

@@ -2325,3 +2325,35 @@
[2026-04-10T15:18:09Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T15:18:09Z] Processing calc submissions...
[2026-04-10T15:18:09Z] Check complete. Next run at 2026-04-10T12:18:EDT
[2026-04-10T16:00:00Z] Starting lead monitor check
[2026-04-10T16:00:02Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T16:00:02Z] Processing calc submissions...
[2026-04-10T16:00:02Z] Check complete. Next run at 2026-04-10T13:00:EDT
[2026-04-10T16:19:22Z] Starting lead monitor check
[2026-04-10T16:19:24Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T16:19:24Z] Processing calc submissions...
[2026-04-10T16:19:24Z] Check complete. Next run at 2026-04-10T13:19:EDT
[2026-04-10T17:00:00Z] Starting lead monitor check
[2026-04-10T17:00:02Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T17:00:02Z] Processing calc submissions...
[2026-04-10T17:00:02Z] Check complete. Next run at 2026-04-10T14:00:EDT
[2026-04-10T18:00:00Z] Starting lead monitor check
[2026-04-10T18:00:02Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T18:00:02Z] Processing calc submissions...
[2026-04-10T18:00:02Z] Check complete. Next run at 2026-04-10T15:00:EDT
[2026-04-10T18:15:34Z] Starting lead monitor check
[2026-04-10T18:15:36Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T18:15:36Z] Processing calc submissions...
[2026-04-10T18:15:36Z] Check complete. Next run at 2026-04-10T15:15:EDT
[2026-04-10T19:00:01Z] Starting lead monitor check
[2026-04-10T19:00:02Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T19:00:02Z] Processing calc submissions...
[2026-04-10T19:00:02Z] Check complete. Next run at 2026-04-10T16:00:EDT
[2026-04-10T19:24:10Z] Starting lead monitor check
[2026-04-10T19:24:12Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T19:24:12Z] Processing calc submissions...
[2026-04-10T19:24:12Z] Check complete. Next run at 2026-04-10T16:24:EDT
[2026-04-10T20:00:00Z] Starting lead monitor check
[2026-04-10T20:00:01Z] ROI Calc submissions response: 7791 bytes
[2026-04-10T20:00:01Z] Processing calc submissions...
[2026-04-10T20:00:01Z] Check complete. Next run at 2026-04-10T17:00:EDT

View File

@@ -1,7 +1,7 @@
{
"processed_leads": [],
"processed_calc_ids": [1, 2, 3, 4],
"last_check": "2026-04-10T15:18:09Z",
"last_check": "2026-04-10T20:00:01Z",
"status": "active",
"notes": "Hourly monitoring enabled. Next check in 60 minutes."
}