MySQL Conference – Building a Vertical Search Engine in a Day

Moving into the user sessions on the first day at MySQL Conference 2007, I attended Building a Vertical Search Engine in a Day .

Some of my notes for reference.

Web Crawling 101

  • Injection List – What is it seed URL’s you are starting from
  • Fetching the pages
  • Parsing the content – words and links
  • Updating the crawl DB
  • Whitelist
  • Blacklist
  • Convergence — avoiding the honey pots
  • Index
  • Map-reduce — split a large problem into little pieces, process in parallel, then combine results

Focused content == vertical crawl

  • 20 Billion Pages out there, a lot of junk
  • Bread-first would take years and cost millions of lives

OPIC + Term Vectors = Depth-first

  • OPIC is “On-line Page Importance Calculation”. Fixing OPIC Scoring Paper
  • Pure OPIC means “Fetch well-linked pages first”
  • We modify it to “fetch pages about MySQL first”

Nutch & Hadoop are the technologies that run on a 4 server cluster. Sample starting with www.mysql.com in 23 loops, 150k pages fetched, 2M URL’s found .

Serving up the results

Tagged with: Databases General MySQL MySQL Conference &Amp; Expo 2007

Why using production workloads over simulated workloads is critical

AI-Assisted SQL Tuning Last week in his keynote speech at Percona Live Bay Area 2026 , Andy Pavlo presented Databases: The Final Boss of Agents and provided some useful insights into query optimization of simulated workloads leveraging AI.

Improving your MySQL Security Posture Presentation

At the MySQL BR Conference 2025 I had the opportunity to speak about Improving Your MySQL Security Posture. You can find a copy of my slides on my Presentations , and a Portugese (Brazil) translation.

MySQL and Heatwave Summit Presentation

Last week I had the opportunity to speak at the MySQL and Heatwave Summit in San Francisco. I discussed the impact of the new MySQL 8.0 default caching_sha2_password authentication, replacing the mysql_native_password authentication that was the default for approximately 20 of the 30 years that MySQL has existed.