A large organization with specific data needs, and a wide range of datasets is best served in the long term by leveraging the available data to deliver useful and innovative insights about its business context, and to deliver those at a profit
This in turn, calls for two distinct respective approaches:
- A “data laboratory,” staffed with data scientists, who question everything; a loose structure that promotes collaboration; a longer-term focus; and a culture that values creativity and the pursuit of “deeper understanding”. Think here of the great Industrial Age labs, such as Bell Laboratories.
- A “data factory,” staffed by process engineers and others with deep technical skills who “get the job done”; a tight structure that promotes consistency, scale, and decreasing unit cost; a shorter-term focus; and a culture that values quality and revenue above all else. Think here of the manufacturing counterparts of the labs referenced above.
To succeed with the data lab, it is emphasized that companies must:
- Create an open, questioning, collaborative environment.
- Nurture a critical mass of data scientists ; provide them access to lots of data, state-of-the-art tools, and time to dream up and work through hundreds of hypotheses — most of which will not yield insight. But they should have the opportunity to hone the ones that do.
- Build a management team that can point data scientists in fruitful directions, and
- .Learn to tolerate risk, while delivering a steady stream of insights that improve existing products and services.
The factory
The factory carries out the tasks beyond the lab scope. This includes: creating a product or service from an insight, figuring out how to deliver and support it, scaling it up, dealing with special cases and mistakes, all to realize profit. It therefore requires:
- A sense of urgency
- Discipline and coordination
- Project plans and schedules
- Higher levels of automation and repeatability
- More people with a wider variety of skill sets
- A more rigid environment, and
- Different sorts of metrics.
- Creativity and experimentation
While being cognizant of this distinction, it is counter-productive to take the ‘factory’ and ‘lab’ too far. Both of these approaches need to work extremely closely and seamlessly.
The 5 Key Challenges to Building a Successful Data Science Lab & Data Team
Building a good and successful data science lab and data team is a challenge to many companies, especially in answering the following questions at the start up and during the operation:
- Goals – Why do you need them? Are there concrete goals for them to work towards, or do you want to appease your investors? What do you expect them to do in a year?
- Roles and team composition – What role should I hire for, given my goals? Data engineer, data scientist, analyst, data communicator? What is my eventual team composition that I’m building towards? What portion of the team should be recruited internally which is then developed with training on data science principles and techniques?
- Team lead – Do I have a strong team lead who can identify, attract and recruit great data scientists, and who can work with the founders to define the company’s data strategy? Expect them to spend 50% of the time on recruiting and interviewing, at least initially.
- Organization and infrastructure – Where will the data team fit in your organization? Can data scientists do their job as soon as you hire them, or do they have to set up your data infrastructure first?
- Recruiting and retention – Data scientists are in demand. What do you offer that others can’t? Do you offer unique learning opportunities, unique data, social mission, career growth?
4 strategies for sustainable lab design
Successful laboratories are the result of extensive planning, collaboration, and coordination between the design team and all impacted stakeholders
- Long-term data management
Computational research and digital modeling of experiments enables researchers to rapidly test ideas virtually, replicating only the most promising results physically in the wet lab. Facilities are making greater use of on-site data centers to process the massive amount of data generated by these tests. Cloud computing is the next data management frontier, allowing labs to host minimal on-site data storage while reaping the benefits of limitless capacity, continually updated technology, saved space and reduced utility bills.
- Resiliency
Resiliency is at the forefront of facility design due to climate change impacts, such as increased flooding, natural disasters and extreme temperatures. A resilient lab is more of a typical lab; it reflects long-term thinking to ensure the building can maintain operations, structural integrity and safety during the worst possible weather events. Introducing natural ventilation into certain spaces within the lab also conserves energy. –
- Building dashboards/User Interface
This can be through remote monitoring and algorithms to automatically detect issues and suggest corrective action to facilities staff. Other systems rely on providing immediate graphic feedback to users, educating and engaging them in energy-reduction efforts. These graphic displays can be used in the lab to help change user behaviors. One example is cfm counters that display real-time lab exhaust rates. When fume hood sashes are open, the cfm display goes up. Such Fun, low-cost programs which engage users in a meaningful ways pay huge dividends over a building’s life.
- Collaboration/Resource sharing
Collaboration focuses on sharing ideas with other research teams within the organization to increase the rate of discovery. When resources are shared, less space is needed—equating to lower operational and maintenance cost.
Harnessing the Data Revolution to Achieve the Sustainable Development Goals
In order to achieve the sustainable development goals, there’s need to harness data revolution which can be realized through:
- Addressing the “crisis of non-existent, inaccessible or unreliable data.”
- Unprecedented increase in the volume and types of data—and the subsequent demand for them.
- Building basic knowledge and awareness of the value of data and then the specific focus on public private partnerships, opportunities, and constraints regarding collection and utilization of data.
- Building domestic institutional capacity to use and maintain new technologies, understand and analyze the data collected, and identifies and implements change based on that analysis.
- Address capacity constraints at all levels
- Create the appropriate enabling environment for leapfrog data technologies to have transformational impact
- Confront and secure consensus on data sharing, ownership, and privacy concerns
- Navigate complex technical environments and create an environment in which leapfrog data technologies could flourish
- Having a solid foundation of infrastructure, skills,
- The ability to collect and utilize accurate data—however small or large
- Increase funding for the data revolution and coordinate donor efforts.
- Increase funding for capacity building as part of an expansion of broader development priorities
References
- http://thegovlab.org/harnessing-the-data-revolution-to-achieve-the-sustainable-development-goals/
- https://www.labdesignnews.com/article/2014/12/10-strategies-sustainable-lab-design
- https://insidebigdata.com/2017/01/18/the-5-key-challenges-to-building-a-successful-data-science-lab-data-team/
- https://hbr.org/2013/04/two-departments-for-data-success