==================== Spatial Join Fuzzer ==================== Overview ======== The Spatial Join Fuzzer tests the correctness of the SpatialJoin operator by generating random geometry data and spatial join plans. It verifies that SpatialJoin produces the same results as NestedLoopJoin for equivalent queries. Supported Features ================== Join Types ---------- The fuzzer tests the two join types supported by SpatialJoin (as defined in ``SpatialJoinNode::isSupported()``): * **INNER** - Only matching rows from both sides * **LEFT** - All rows from left side, matched rows from right side Spatial Predicates ------------------ The fuzzer tests these spatial predicates: * ``ST_Intersects(geometry1, geometry2)`` - Tests if geometries intersect * ``ST_Contains(geometry1, geometry2)`` - Tests if one geometry contains another * ``ST_Within(geometry1, geometry2)`` - Tests if one geometry is within another * ``ST_Distance(geometry1, geometry2) < threshold`` - Tests distance with threshold Geometry Types -------------- The fuzzer generates Well-Known Text (WKT) strings for three geometry types: * **POINT** - Single coordinate point (e.g., ``POINT (10.5 20.3)``) * **POLYGON** - Closed shape with vertices * **LINESTRING** - Line segment between two points Distribution Patterns --------------------- Geometries are generated using three distribution patterns: * **Uniform** - Geometries uniformly distributed in space (0-1000 range) * **Clustered** - Geometries grouped in 5 specific regions to test overlap scenarios * **Sparse** - Geometries widely spread (0-2000 range) with low overlap probability Implementation Details ====================== Geometry Generation ------------------- Geometries are generated using ``AbstractInputGenerator`` subclasses: * ``PointInputGenerator`` - Generates POINT WKT strings * ``PolygonInputGenerator`` - Generates POLYGON WKT strings * ``LineStringInputGenerator`` - Generates LINESTRING WKT strings Each generator implements the ``generate(vector_size_t index)`` method to produce geometry strings based on the distribution pattern. **Uniform Distribution**:: x = random(0, 1000) y = random(0, 1000) POINT (x y) **Clustered Distribution**:: cluster = row % 5 // 5 clusters centerX = cluster * 200 + 100 centerY = cluster * 200 + 100 x = centerX + random(-50, 50) y = centerY + random(-50, 50) POINT (x y) **Sparse Distribution**:: x = random(0, 2000) // Larger Range y = random(0, 2000) POINT (x y) Data Matching Strategy ---------------------- To ensure some matches occur during joins: * Build side copies ~30% of geometries from probe side * 10% chance of empty build side to test edge cases Verification ------------ The fuzzer compares results from two equivalent plans: 1. **SpatialJoin plan** - Using the specialized SpatialJoin operator 2. **NestedLoopJoin plan** - Using NestedLoopJoin with the same spatial predicate as a filter Results must match exactly, validating that SpatialJoin implements spatial predicates correctly. Key Differences from JoinFuzzer ================================ Join Conditions --------------- Unlike regular joins with simple equality predicates:: // Regular join probe.id = build.id // Spatial join ST_Intersects(probe_geom, build_geom) Spatial joins use **function call expressions** as join conditions rather than simple column references.