
Constructor
8 days ago

About us
Constructor is the next-generation platform for search and discovery in e-commerce, built to explicitly optimize for metrics like revenue, conversion rate, and profit. Our search engine is entirely invented in-house utilizing transformers and generative LLMs, and we use its core and personalization capabilities to power everything from search itself to recommendations to shopping agents. Engineering is by far our largest department, and we’ve built our proprietary engine to be the best on the market, having never lost an AB test to a competitive technology. We’re passionate about maintaining this and work on the bleeding edge of AI to do so.
Out of necessity, our engine is built for extreme scale and powers over 1 billion queries every day across X languages and with customers based out of Y countries. It is used by some of the biggest ecommerce companies in the world like Sephora, Under Armour, and Petco.
We’re a passionate team who love solving problems and want to make our customers’ and coworkers’ lives better. We value empathy, openness, curiosity, continuous improvement, and are excited by metrics that matter. We believe that empowering everyone in a company to do what they do best can lead to great things.
Constructor is a U.S. based company that has been in the market since 2019. It was founded by Eli Finkelshteyn and Dan McCormick who still lead the company today.
Job Description
The Constructor Data Platform is a foundational component for all internal data and ML teams. It handles the ingestion of over 1 TB of compressed events daily and manages over 6 PB of data in our data lake.
The Data Platform:
- Is a comprehensive set of tools and infrastructure used daily by every data scientist and ML engineer in our company.
- Implements public-facing APIs for event ingestion (FastAPI) and real-time analytics (ClickHouse, Cube).
- Manages data storage in appropriate formats (S3, ClickHouse, Delta).
- Facilitates data processing using technologies such as Python, Spark/Databricks, ClickHouse, AWS Lambda, and Kinesis.
- Includes robust monitoring solutions (Prometheus, OpenTelemetry, PagerDuty, Sentry).
- Ensures automated testing of pipelines and data quality.
- Provides cost observability and optimization capabilities.
- Offers comprehensive tools for developers to develop, run, test, and schedule data pipelines, along with all necessary support and documentation.
Our platform is developed by the Data Lake Team and the Data Infrastructure Team.
About the Data Infrastructure Team
Were hiring a Senior Data Engineer to work on our Data Infrastructure Team. This team is responsible for:
- Job scheduling and orchestration for data pipelines.
- Deployment and management of BI tools.
- Real-time analytics infrastructure (ClickHouse, AWS Lambda, Cube.js, and related tooling).
- Real-time log ingestion and processing, including data compliance.
- Core data services (e.g., Kubernetes, Ray, metadata services) and enterprise-wide observability solutions (based on ClickHouse and OpenTelemetry).
We are seeking an engineer with at least 4 years of experience who possesses strong programming skills (ideally in Python), and expertise in big data engineering, web services, and cloud platforms (ideally AWS). We are looking for someone eager to build diverse components and drive the evolution of our platform while working closely with our users. Excellent English communication skills and robust computer science background is a strong requirement.
You will contribute to building various data platform components, actively incorporate user feedback, and proactively drive improvements. Here are some of the projects you may be involved with:
- Implement a real-time, cross-regional event processing solution, ingesting data into multiple data storage systems.
- Deploy OpenMetadata within Kubernetes for use by data pipelines.
- Design ClickHouse tables to store logs for company services and define a way to interact with them via Grafana.