Work With Us
In the following, we give you a blueprint of how we will engage with you. Our approach is the result of distilling our experience from a wide variety of projects we have completed over time. Two key points are:
- Our workflow uses extensive data exploration combined with well-designed experiments.
- We will establish a constant feedback loop with your business experts.
The first point reflects that data science is closer to applied research than it is to engineering. We recognize that the outcome is uncertain at the beginning, and systematically de-risk as we progress with our work. The second point addresses the need to solve the "right" problems - and to build tools that maximize business impact.
A typical full engagement has a length of 6 to 8 weeks. Larger projects can be scoped into multiple steps.
There are four distinct phases:
We take an hour or two to talk about your goals, preferably in person.
Our research and development work starts when both a useable compute environment is ready, and all your data is accessible within it.
For a few days, we do a deep dive into the data, metadata, and your business requirements. The goal is to both check feasibility, and to develop a sound Statement of Work (SOW). This SOW will have a realistic timeline with milestones, success metrics, and stop/go decision points. We usually work on an hourly rate, as we are still getting to know your data and infrastructure.
We work as outlined in the SOW. Typically once a week we have one hour long touch points. Those would be ideally in person if you are in the NYC area, otherwise video conference calls work well. We expect you to include business subject matter experts as needed for our discussions. We won't fill your calendar with meetings, we give minor status updates and answer your questions using our basecamp.com workspace. This keeps all documents and discussions in one place, and tracks progress for everyone as well. We often switch to milestone-based pricing for this phase.
- Knowledge Transfer:
We take time to collect, document, and hand over all artifacts. We generally will deliver our results via a well-documented git repository for code, a PDF summary report, and a final in-person presentation for all stakeholders. We will support your team getting to know what they have to do to deploy models, and how the tools we developed integrate with your workflow. We will take feedback, discuss the next steps and future opportunities.
We use Python, Docker and Kubernetes for our work and to deploy models into production. We maintain our own secure research and development server which can handle computationally demanding projects (32 core AMD EPYC, 512 GB RAM, 2x NVIDIA Titan RTX GPU, 8+ TB NVME SSD drives). We can scale out as needed using cloud infrastructure. We have worked with all kinds of environments in the past - from IBM System i, AWS/Google/Digital Ocean, Cloudera, to company laptops connected via VPN to in-house clusters.
Should you want to provide an environment yourself, we would expect the following as minimal configuration:
- up-to-date Ubuntu Linux with typical developer tools installed
- Python 3.6 and newer; ability to install additional Python packages using conda
- 64 GB RAM, 8 core CPU
- Nvidia V100/Titan RTX GPU