Code Security Best Practices: Protecting Your Projects When Collaborating and Using AI

Pieces 🌟 - Nov 27 '23 - - Dev Community

Two-thirds of application developers ship code with known security vulnerabilities. This startling statistic has an unsurprising backstory: thirty-six percent of developers cite delivery deadlines as a reason. With the rise of artificial intelligence (AI) and machine learning tools for code enhancement, developers may have new opportunities to improve the efficiency and security of software creation.

This opportunity for expert co-creation has the potential to significantly improve code security and the overall software development process. AI-powered tools can:

  • Offer context-aware code completion suggestions, making it easier and faster to write code
  • Assist in the AI code review process by automatically flagging potential issues, such as performance bottlenecks
  • Scan code for security vulnerabilities and suggest fixes
  • Identify issues like SQL injection risks, data leaks, and other security threats, helping to make code secure
  • Search and navigate large codebases, making it easier to find relevant code snippets, functions, or modules, thereby speeding up the development process

Intelligent code completion and auto-enrichment features, if used with appropriate oversight, may give developers more time and encourage practices that can overcome barriers to code security. This article provides insight into the uses of AI for code improvement and security best practices. You'll get a high-level view of how to use AI to improve security and learn about some AI blind spots.

Should My Organization Use AI?

The answer is almost certainly yes. Artificial intelligence makes many programming functions easier: it can fetch helpful snippets, debug thousands of lines of source code with blazing speed, and autocomplete accurately from context. Despite its many advantages, many organizations are restricting its use. For example, Samsung banned the use of generative AI tools like ChatGPT on company-owned devices after an internal data leak. This may make you think twice about adopting AI.

The reason for these restrictions is the shared nature of large AIs. Public AI tools like ChatGPT are incredibly powerful and can assist in various tasks, but the nature of their growth can introduce risk when you give them intellectual property and confidential data.

Large language models (LLMs) and other AI tools improve themselves by learning from the data they process. This means that the proprietary or sensitive information you input could potentially be used to further train the model. While this is generally done in a way that anonymizes the data, the risk of data leakage still exists.

However, if you want to use AI and mitigate risk, there are several security practices that can improve its outcomes.

Consider Local or On-Device AI

It won't always be possible to provide sufficient context and content to a public AI while maintaining the confidentiality of your intellectual property. Combining the need for code security and the need for AI may call for the use of local large language models (LLLMs) or small language models on-device.

Tools like Pieces offer significant advantages in terms of both code security and efficiency while keeping data local to the organization or even the desktop. An even more compelling reason to opt for local tools is the enhanced data privacy they offer. When data is stored in the cloud, it becomes susceptible to a variety of risks, including unauthorized access and data breaches.

By keeping all data and computations on your device, these risks are substantially mitigated. And by leveraging on-device LLMs like Llama 2 with Pieces Copilot, you can generate code and ask questions of your personal repositories, without compromising your network security.

Store Secrets in a Central Location

Managing sensitive information like API keys or passwords is a critical aspect of secure development. Developers often resort to convenient but insecure methods for storing secrets, such as environment variables, configuration files, or even plain text files.

These methods are susceptible to accidental commits to public repositories and pose significant security risks. There have been instances where such practices led to data breaches. For example, a recent news story about the potential leak of all Binance API keys highlighted the potential for huge data breaches and financial losses when secrets are mismanaged.

Specialized tools like Pieces offer centralized storage for secrets. Using a single, secure location (where all sensitive information can be stored and managed) eliminates the need to remember where each secret is stored, making it easier for developers to manage multiple projects or work within larger teams.

Secure storage is not just about keeping secrets inaccessible to unauthorized users; it's also about ensuring that the secrets are encrypted and protected from potential vulnerabilities. Specialized tools often use robust encryption algorithms and multifactor authentication to add layers of security.

Review AI-Generated Code for Security Flaws and Bad Logic

AI-generated code can be a powerful tool for accelerating development, but it's not without its pitfalls. AI will not always make appropriate choices when suggesting code for your applications, as it's not completely context-aware and does not have the same insight as a human. Using a tool like Pieces will allow you to generate more contextualized code through Retrieval-Augmented Generation (RAG) and the ability to set custom context for each conversation based on your personal repos, snippets, and other developer materials.

You should always scrutinize the code generated by AI for potential security flaws and logical errors. For instance, a cloud-based LLM that's not trained to recognize the need for security might generate code that includes an unauthenticated API call. Such a flaw could lead to unauthorized data access and pose a significant security risk. Additionally, AI-generated code can sometimes introduce logical errors, such as a sorting algorithm that doesn't handle edge cases well.

Compensating for AI-generated problems is similar to compensating for human-introduced problems. Secure code best practices are the same regardless of the origination: use purpose-built development security tools; review the code for vulnerabilities; explore the dependencies; maintain updates for third-party tools; and understand secure code standards. Secure development relies on these activities, and neither a human nor an AI co-developer provides a secure base without them.

Incorporate Active Code Scanning Tools

Active code security scanning tools are the fastest and most certain way to inject security into the code pipeline. These tools are especially advantageous with AI-provided code, as they do not require deep familiarity with the specifics of the codebase. Active code security scan tools automatically scan your codebase for known vulnerabilities, bad practices, and logical errors and provide immediate feedback. This enables you to catch and fix issues early in the development process, thus reducing the risk of security breaches.

Code security tools like SonarQube or Checkmarx can automatically scan your codebase for vulnerabilities and bad logic. Ideally, the active scanning tool should be inserted right after the code is generated and before it's merged into the main codebase. This ensures that any code security vulnerabilities are caught early, reducing the risk of flawed code making it into production.

For more insights into best practices when using AI-generated code, you can check out guidelines on using AI in your development or DevOps pipeline.

Regardless of the practices or tools used, remember to approach AI-generated code with a critical eye. According to an article on freeCodeCamp, you should complement AI-generated code with manual programming. Carefully review and test all auto-generated code to ensure it meets your project's standards for both logic and security, even after it has been scanned.

Review Dependencies

Incorporating third-party dependencies into your project can save time and effort, but it's crucial to exercise caution, especially when these dependencies are recommended by AI. AI tools may suggest third-party libraries or frameworks that seem to fit your project's needs. However, it's essential to review these recommendations carefully. Ensure that the dependencies are actively maintained, have strong community backing, and don't have known security vulnerabilities.

For example, the Log4j vulnerability serves as a cautionary tale about the risks of not thoroughly vetting third-party dependencies. Unless instructed otherwise, AI does not consider whether a package is offered from a secure source or whether announcements about corrupted third-party packages might apply to your request.

To simplify this process, consider using open source dependency scanning tools like Snyk Open Source. These tools can automatically identify vulnerabilities in your dependencies and even suggest fixes or alternatives. Once you've decided to use a particular dependency, make sure to keep it updated. Outdated libraries can introduce security risks and compatibility issues. Automated tools can help you stay on top of updates and ensure your project remains secure and efficient.

Always test the latest versions of dependencies in a controlled environment before deploying them to production. Tools like Dependabot will ensure you always have the most recent updates by automatically creating pull requests.

Identify Needed Updates in AI-Generated Code

AI-generated code may include calls to libraries or APIs. To ensure you're using the latest versions, cross-reference the versions in the generated code with the latest releases on the respective repositories or official websites. Some AI tools also provide notifications or flags for outdated dependencies. If you're not using automated tools, you can manually check for updates by visiting the official websites or repositories of the libraries you're using.

Subscribe to mailing lists or follow social media channels that announce updates. Package managers like npm may also include commands such as npm outdated\ to identify and list outdated dependencies.

Learn about Secure Coding Standards

If you're just getting started with secure coding, resources like the OWASP Top Ten can provide a good starting point. Websites like Secure Code Warrior offer interactive training that can help you understand common vulnerabilities and how to avoid them. A thorough secure code review process should involve multiple team members with different areas of expertise. This ensures that the code is examined from various perspectives, increasing the likelihood of catching any issues.

Use Code Completion Features in IDE Environments

Code completion features in IDEs can significantly speed up your development process. They provide real-time suggestions as you type, reducing the need for manual lookups and decreasing the likelihood of typos and errors.

As with code security, though, context awareness for code completion is not always observed in a public AI, even one that purports to be tailored for coding. Efficient coding includes reuse, but a public AI may suggest code that is not secure, appropriate, or even useful in response to your query—no matter how skilled you are at prompting.

If you need context, security, and traceability, you can save time by choosing an IDE with an AI copilot optimized to conform to good coding standards. This is an area of focus for Pieces, with its unique approach to code completion and generation.

Unlike other tools that may suggest code from a broad database, Pieces only offers autocomplete based on your saved snippets and items you choose from its built-in collection. This ensures that the suggestions are tailored to your coding style and project needs, reducing the chance of introducing errors. This option not only speeds up your workflow but also ensures that the code you're writing is more accurate and appropriate for your projects.

Additionally, Pieces' automatic enrichment of saved snippets makes it easier to find where you got the code snippet. The AI will automatically add related links, including the origin source, so you can always have teammates double-check the source to continue the project or make sure it's the right snippet. This enrichment also tags collaborators on your code, so anyone who edits the code is credited. This makes it easier to credit contributors and makes your code more difficult to plagiarize.

Furthermore, you can leverage Pieces Copilot within your IDE, to generate contextualized code with a local LLM that doesn’t require an internet connection at all. This air-gapped system enables even the most secure enterprise environments to leverage the benefits of AI.

The following is a simple Python script to check for RPM updates on a Linux system, saved as a code snippet in Pieces Desktop:

A Python script that checks for RPM updates—a helpful snippet for any Python system.

Pieces can automatically annotate the code by reading the developer's comments. But it goes further and adds annotations about the functionality:

Pieces automatically annotates the code without any need to duplicate code comments into documentation.

Pieces also offers Pieces Copilot, a natural language chatbot that allows you to ask the AI about the code itself (for example, about its content or metadata). If you use Copilot, you can query the AI within the IDE. For instance, you can ask not just about the function of the script but also how the script executes its function:

Pieces Copilot explains the code in natural language, on demand.

Developers can leverage Pieces Copilot within the IDE to generate contextualized code with a local LLM that doesn’t require an internet connection at all. This air-gapped system enables even the most secure enterprise environments to leverage the benefits of AI.

Have an Incident Management Process

An unfortunate truth is that code and information systems that seem secure today will inevitably have vulnerabilities revealed tomorrow. Those vulnerabilities may even be exploited before they are announced or patched by the vendor. As with the mentioned Log4j vulnerability issues, a security issue can go from invisible to widely exploited without warning. The origin of the vulnerability makes no difference during the management of an incident.

Whether the insecurity is introduced by flawed technology or inappropriate coding practices, your organization must respond promptly and with a prerehearsed plan.

Incident response plans should be in place, practiced, and revised regularly to account for the ever-changing technology environment. This is not limited to insecure code, but every IT professional should expect and prepare to participate in security incident response during their career.

Conclusion

Code security and efficiency are critical in today's technological landscape, and developers are under tremendous pressure to release new functionality quickly. The older model of quarterly updates along with the occasional security patch is no longer the reality for application developers. Functionality rarely waits for a measured review, and AI is an efficient partner that can—but often does not—improve the security of your source code.

There is no shortcut that removes the need to apply code security best practices. But leveraging tools like Pieces helps developers improve their development process while ensuring robust protection for their codebase. You can rely on the Pieces Copilot to generate code, document metadata, maintain your secrets locally, and aid in the location and reuse of snippets already available from former programming projects. Download the desktop application and see how quickly Pieces improves your efficiency, without compromising code security.

Learn more about the evolution of AI in software development.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .