1 min readfrom Data Science

Which platform do you use to execute your code?

Our take

In the pursuit of modernizing your data analysis workflows, selecting the right platform for executing code is crucial. As you transition from legacy tools to Python, the challenge of managing large datasets becomes apparent. It’s essential to find a solution that accommodates your needs without being hindered by traditional IT constraints. Tools like Posit Workbench and Databricks could be a good fit. If you’re grappling with similar issues, consider exploring our article, "Advice?

The transition from legacy toolsets to more agile, innovative platforms is a pressing issue for many organizations, especially within sectors like banking where data management is critical. In a recent discussion, a data professional shared their journey towards adopting Python and the challenges faced in establishing an effective environment for model development and execution. The crux of their struggle lies in balancing the need for robust data analysis capabilities with existing IT governance structures, which often lean heavily on traditional software development lifecycle (SDLC) standards. This situation isn't unique, as many in the data science community are grappling with similar dilemmas of modernization while navigating organizational frameworks that may not align with the agile nature of data analytics. For further insights, you might explore Advice? My boss wants me to stop making Shiny apps and instead hand off the front end to a software engineer., which touches on the tension between traditional roles and data-driven development.

The challenge, as noted in the conversation, is that the sheer volume of data makes local execution impractical. As data scientists look for platforms like Posit Workbench or Databricks to facilitate their workflows, they encounter resistance from IT departments that prioritize regulatory compliance over innovative exploration. This disconnect highlights a broader theme in the industry: the necessity for organizations to bridge the gap between evolving data needs and established IT policies. It raises important questions about how institutions can adapt their governance frameworks to support more dynamic data initiatives without compromising security or compliance.

Moreover, such discussions underscore the importance of fostering a culture that values data-driven insights over rigid adherence to outdated processes. Legacy systems, while historically reliable, can stifle creativity and limit the potential for transformative data applications. As companies like banks transition to more flexible, cloud-based solutions, they must also advocate for a shift in mindset among IT teams. This could entail reimagining SDLC processes to better accommodate the iterative nature of data science and analytics, which often require rapid experimentation and adaptation. A relevant exploration of this topic can be found in the article How do I find and fix a “Cannot find #REF!#REF!” error?, which illustrates the complexities that can arise even in seemingly simple processes.

Looking ahead, the ongoing evolution of data management platforms will likely prompt further discussions on best practices for collaboration between data science and IT departments. As organizations increasingly rely on sophisticated analytics to drive decision-making, they must cultivate frameworks that embrace flexibility, innovation, and user-centric design. The implications of these changes extend beyond immediate technical challenges; they signify a shift towards a future where data-driven insights are not only accessible but also integral to strategic operations. This raises an essential question: How will organizations redefine their collaboration models to harness the full potential of data, ensuring both compliance and innovation? The answers to these questions will shape the future landscape of data science and analytics, making it an exciting area to watch.

I'm interested in hearing how people here execute their code. Are they cloud hosted or on-prem?

I work in a bank, we are aiming to get off our legacy toolset and into Python. The challenge is getting an environment where we can run and develop our models. Our data is too big to handle on a laptop, so we are looking for some sort of platform to execute code on.

We have looked into standing up our own servers where we can run code, but IT is adamant that we be subject to SDLC standards, which makes sense for traditional application development, but not super applicable to data analysis and model development workflows. They don't seem to understand that our "application" is a data cruncher that we can use to generate insights.

I've looked at tools like Posit Workbench or Databricks that I think would fit our needs but I'm interested in hearing how other companies enable their data scientists to execute their code.

submitted by /u/a157reverse
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Related Articles

Tagged with

#generative AI for data analysis#Excel alternatives for data analysis#data analysis tools#big data management in spreadsheets#conversational data analysis#data visualization tools#big data performance#no-code spreadsheet solutions#real-time data collaboration#intelligent data visualization#enterprise data management#data cleaning solutions#natural language processing for spreadsheets#rows.com#self-service analytics tools#business intelligence tools#collaborative spreadsheet tools#cloud-based spreadsheet applications#automation in spreadsheet workflows#cloud-native spreadsheets