Which platform do you use to execute your code?
Our take
The transition from legacy toolsets to more agile, innovative platforms is a pressing issue for many organizations, especially within sectors like banking where data management is critical. In a recent discussion, a data professional shared their journey towards adopting Python and the challenges faced in establishing an effective environment for model development and execution. The crux of their struggle lies in balancing the need for robust data analysis capabilities with existing IT governance structures, which often lean heavily on traditional software development lifecycle (SDLC) standards. This situation isn't unique, as many in the data science community are grappling with similar dilemmas of modernization while navigating organizational frameworks that may not align with the agile nature of data analytics. For further insights, you might explore Advice? My boss wants me to stop making Shiny apps and instead hand off the front end to a software engineer., which touches on the tension between traditional roles and data-driven development.
The challenge, as noted in the conversation, is that the sheer volume of data makes local execution impractical. As data scientists look for platforms like Posit Workbench or Databricks to facilitate their workflows, they encounter resistance from IT departments that prioritize regulatory compliance over innovative exploration. This disconnect highlights a broader theme in the industry: the necessity for organizations to bridge the gap between evolving data needs and established IT policies. It raises important questions about how institutions can adapt their governance frameworks to support more dynamic data initiatives without compromising security or compliance.
Moreover, such discussions underscore the importance of fostering a culture that values data-driven insights over rigid adherence to outdated processes. Legacy systems, while historically reliable, can stifle creativity and limit the potential for transformative data applications. As companies like banks transition to more flexible, cloud-based solutions, they must also advocate for a shift in mindset among IT teams. This could entail reimagining SDLC processes to better accommodate the iterative nature of data science and analytics, which often require rapid experimentation and adaptation. A relevant exploration of this topic can be found in the article How do I find and fix a “Cannot find #REF!#REF!” error?, which illustrates the complexities that can arise even in seemingly simple processes.
Looking ahead, the ongoing evolution of data management platforms will likely prompt further discussions on best practices for collaboration between data science and IT departments. As organizations increasingly rely on sophisticated analytics to drive decision-making, they must cultivate frameworks that embrace flexibility, innovation, and user-centric design. The implications of these changes extend beyond immediate technical challenges; they signify a shift towards a future where data-driven insights are not only accessible but also integral to strategic operations. This raises an essential question: How will organizations redefine their collaboration models to harness the full potential of data, ensuring both compliance and innovation? The answers to these questions will shape the future landscape of data science and analytics, making it an exciting area to watch.
I'm interested in hearing how people here execute their code. Are they cloud hosted or on-prem?
I work in a bank, we are aiming to get off our legacy toolset and into Python. The challenge is getting an environment where we can run and develop our models. Our data is too big to handle on a laptop, so we are looking for some sort of platform to execute code on.
We have looked into standing up our own servers where we can run code, but IT is adamant that we be subject to SDLC standards, which makes sense for traditional application development, but not super applicable to data analysis and model development workflows. They don't seem to understand that our "application" is a data cruncher that we can use to generate insights.
I've looked at tools like Posit Workbench or Databricks that I think would fit our needs but I'm interested in hearing how other companies enable their data scientists to execute their code.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- What has been people's experience with "full-stack" data roles?I started my career being a jack of all trades - hired as a data analyst but I had to extract, clean, and then analyze data and even sometimes train models for simple predictions and categorization. That actually led me to become a data engineer but I've spent most of my career working closely with data scientists and trying my best to make their jobs easier by taking away all the preprocessing tasks away from them so they can focus on training, inference MLops, etc. While I claim to have helped them, to be honest DE teams often become a bottleneck and an obstacle. Everything from not being able to provide the data needed to train on time, or how we processed the data was wrong and led to bad performance, or they went live with a model blindly because we couldn't get them the observation data on time for them to analyze accuracy. I'm wondering how much of the data engineering tasks can be automated/vibed away by data scientists. My guess is that in larger companies this won't be the case but I think startups and SMBs want to move fast so they'd rather have data scientists own the whole pipeline. What has been other's experience with this and where is it heading? submitted by /u/uncertainschrodinger [link] [comments]
- Any point in using a no-code ETL tool to transform excel?I'm trying to understand whether I should be using a third party tool to do some data transformation to join/filter a few spreadsheets on a regular basis or simply try and learn Power Query (which I don't like to tbh, but maybe that's because I haven't used it enough). What do you guys think? Should I go all in Power Query for a while, or maybe explore some other tool (no code, preferably) to create a "workflow" and run them from time to time? Have you guys experienced anything similar to this? submitted by /u/Remote-Ad-6629 [link] [comments]
- Healthcare (insurance, pop health, VBC) - actual AI use cases?Pretty open ended here. I work in population health for a VBC organization. Goals are improving patient outcomes and reducing cost of care, particularly for Medicaid population. Can anyone share actual AI use cases that are valuable? Outside of AI coding agents (huge value for some) nothing has really taken off. Example: AI-generated patient summaries from medical claims and operational data. Super rich context about risk factors, gaps in care, recent conversations, etc. Providers loved the idea but zero adoption because they value autonomy and their judgement. Example: Natural language chat interface to various operations and staff performance datasets. No uptake because nobody knew what to ask. Dashboards are just easier. Example: Natural language interface to program outcomes via causal analytics. Literally ask about any market/program/subgroup and outcomes attributable to program. Zero adoption among executives because they either want 1) a quick verbal explanation or 2) a spreadsheet and slide deck. submitted by /u/dmorris87 [link] [comments]
- Spreadsheets aren’t good for everything?It seems many teams still rely on spreadsheets for things they weren’t built to handle. At what point do you switch to a more structured or flexible data approach? submitted by /u/Holdpile [link] [comments]