1. Designing for Security and Compliance. Considerations include:
1. Identity and access management (e.g., Cloud IAM).
2. Data security (encryption, key management).
3. Ensuring privacy (e.g., Data Loss Prevention API).
4. Legal compliance (e.g., Health Insurance Portability and Accountability Act (HIPPA), Children's Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR)).
Since data engineers work with diverse sets of data, they will likely need to use a variety of data stores that use access controls. They also should be prepared to work with sensitive data that needs additional protections. This chapter introduces several key topics of security and compliance, including:
1. Identity and access management.
2. Data security, including encryption and key management.
3. Data loss prevention.
4. Compliance.
We'll begin with identity and access management, because it is fundamental to many security practices.
Identity and Access Management with Cloud IAM
Cloud IAM is Google Cloud's fine-grained identity and access management service that is used to control which users can perform operations on resources within GCP. Cloud IAM uses the concept of roles, which are collections of permissions that can be assigned to identities. Permissions are granted to roles, and then individuals can be assigned multiple roles to gain these permissions. Cloud IAM provides a large number of roles tuned to common use cases, such as server administrators or database operators.
Along with roles, additional attributes about resources or identities, such as IP address and date and time, can be considered when making access control decisions; this is known as context-aware access.
Cloud IAM maintains an audit log of changes to permissions, including authorizing, removing, and delegating permissions.
This chapter describes key aspects of Cloud IAM that you should understand including:
1. Predefined roles,
2. Custom roles,
3. Using roles with service accounts,
4. Access controls with policies.
Together, these constitute the ways that you can control access to resources in GCP. In GCP, users, groups, service accounts, and G Suite domains are authorized to access resources by granting those identities roles. As noted earlier, roles for collections of permissions. When a role is granted to an identity, that identity is granted all the permissions in that role. You do not directly assign permissions to identity; identities get permissions via roles.
GCP uses three types of roles:
1. Primitive roles,
2. Predefined roles,
3. Custom roles.
Primitive roles include the Owner, Editor, and Viewer, which existed prior to the introduction of Cloud IAM. These roles apply at the project level and so are considered course-grained access controls.
1. The Viewer role grants read-only access resources.
2. The Editor role includes all Viewer permissions plus the ability to modify the state of a resource.
3. The Owner role includes all Editor role permissions and permissions to manage roles and permissions, along with setting up billing for a project.
In general, you shouldn't use primitive roles except in cases where coarse-grained access controls are acceptable. For example, you could use primitive roles to grant access to developers in a development environment, since the developers would be responsible for administering the development environment.
Predefined Roles
Predefined roles are generally associated with a GCP service, such as App Engine or BigQuery, and a set of related activities, such as editing data in a database or deploying an application to App Engine.
The naming convention for roles is to start the role name with roles/ followed by a string that identifies the service, such as app engine; followed by the type of entity to which the role applies, such as instance or table; followed by an operation, such as:
1. get,
2. list,
3. create.
Let's look at a couple of examples. The roles/appengine.deployer role grants read-only access to all application and configuration settings and write access to create new versions. This role doesn't provide permission to modify existing applications except for deleting versions that are no longer receiving traffic. The permissions included in this are as follows (App Engine):
As you can see, the naming convention for permissions is the name of the service followed by a resource type specific to that service and an action on resources of that type. The asterisk in this example indicates all types of actions applicable to the operation's resource, such as get, list, and create.
As a second example, BigQuery has a user role called roles/bigquery.user that grants permissions to run queries and other jobs within a project. Users can list their own jobs and datasets as well as create new datasets.

Many services have similar sets of roles having a similar set of permissions, often including admins, viewers, and some kind of worker roles. For example, the roles available with the following GCP services include:
1. Cloud Dataproc: roles/dataproc.editor, roles/dataproc.viewer, roles/dataproce.admin, and roles/dataproc.worker
2. Cloud Dataflow: roles/dataflow.admin, roles/dataflow.developer, roles/dataflow.viewer, and roles/dataflow.worker
3. Cloud Bigtable: roles/bigtable.admin, roles/bigtable.user, roles/bigtable.viewer, and roles/bigtable.reader
4. BigQuery: roles/bigquery.admin, roles/bigquery.connectionAdmin, roles/bigquery.connectionUser, roles/bigquery.dataEditor, roles/bigquery.dataOwner, roles/bigquery.dataViewer, roles/bigquery.jobUser, roles/bigquery.metadataViewer, roles/bigquery.readSessionUser, and roles/bigquery.user
Note that BigQuery uses fine-grained permissions on BigQuery resources, such as connections, metadata, and sessions.
Custom Roles

In addition to primitive and predefined roles, GCP allows for thsee use of custom-defined roles. With custom roles, you can assign one or more permissions to a role and then assign that role to a user, group, or service account.
Custom roles are especially important when implementing the principle of least privilege, which states that users should be granted the minimal set of permissions need for them to perform their jobs. Because of this, you may want to grant someone a different subset/combination of permissions than what is available in the predefined roles.
Users must have the 'iam.roles.create' permission to be able to create a custom role. In addition to typical name, description, and identifier parameters, you can also specify a role launch stage, which can be alpha, beta, general availability, or deprecated. Custom roles usually start in in the alpha stage and are then promoted to the beta or general availability stage after sufficient testing. The deprecation stage is used to indicate to users that the role shouldn't be used.
When creating a custom role, you can select from permissions assigned to predefined roles. This is helpful if you want someone to have a more limited version of a predefined role, in which case you can start with the list of permissions in a predefined role and select only the permissions that you would like to grant.

Using Roles with Service Accounts
Service accounts are a type of identity often used with VM instances and applications, which are able to make API calls authorized by roles assigned to the service account. A service account is identified by a unique email address. The email address name varies by the type of service account. For example:
App Engine service account: Uses <PROJECT-ID> followed by
@appspot-gserviceaccount.com, such as pde-exam-project-98765@appspot.gserviceaccount.com
Compute Engine service account: Uses <PROJECT-NUMBER> followed by -compute@developer.gerviceaccount.com, such as 601440987865@developer.gserviceaccount.com
User-defined service accounts, such as pde-exam-service-account@601440987865@iam.gserviceaccount.com
Service accounts don't have passwords and cannot be used to log in interactively via a browser. These accounts are authenticated by a pair of public/private keys.
Consider an application running in a VM that needs to write messages to a Cloud Pub/Sub topic. We could assign the role roles/projects.topics.publish to pde-exam-project-98765@developer.gserviceaccount.com. Note that the application or VM is the entity using this service account in order to get the publish permisson. With that role assigned, the application could then call the Cloud Pub/Sub API to write messages to a topic created within the same project as the service account.
User-managed keys, also called "external keys", are used from outside of Google Cloud. GCP stores only the public key. The private key is managed by the user. These keys are usually used as application default credentials, which are used for server-to-server authentication.
Access Control with Policies

You use roles to grant identities permission to perform some actions, but you can also access resources based on the rules associated with the resource These rules are called policies.
A policy has three parts:
1. Bindings,
2. Metadata,
3. Audit configuration
Bindings specify how access is granted to a resource. Bindings are made up of members, roles, and conditions. A member is an identity, which could be a user, group, service account, or domain. The role is simply a named collection of permissions. Conditions are logic expressions for describing context-based restrictions.
The metadata of a policy includes an attribute called 'etag', which is used for concurrency control when updating policies. This is needed because multiple applications can write to a policy at the same time. When retrieving a policy, you also retrieve the 'etag', and when writing the policy, you compare the 'etag' that you retrieved with the current etag of the policy. If the two etags are different, then another application wrote to the policy after you retrieved it. In those cases, you should retry the entire update operation. Metadata also includes a version to indicate the iteration of the schema used in the policy.
Here is an example of a binding that binds the user data-enginer@example.com to the role roles/resourcemanager.projectCreator:
{
"bindings": [
{
"members": [
"user:data-engineer@example.com"
],
"role": "roles/resourcemanager.projectCreator"
},
],
"etag": "adjfadURHlad",
"version": 1
}
Audit configurations describe which permission types are logged and which identities are exempt from logging. For example, the following audit configuration enables logging on reads and writes and exempts the user data-engineer@example.com from logging read operations.
{
"auditLogConfigs": [
{
"logType": "DATA_READ",
"exemptedMembers": ["user:data-engineer@example.cm"]
{,
{
"logType": "DATA_WRITE",
}
]
}
Policies can be defined at different levels of the resource hierarchy, including organizations, folders, projects, and individual resources. Only one policy at a time can be assigned to an organization, folder, project, or individual resource.
Policies are inherited through the resource hierarchy. Folders inherit the policies of the organization. If a folder is created within another folder, it inherits the policies of the encapsulating folder. Projects inherit the policies of the organization and any higher-level folders. Resources inherit the policies of the organization, folders, and projects above them in the hierarchy. The combination of policies directly assigned to a resource, and the policies inherited from ancestors in the resource hierarchy, is called the 'effective policy'.
It is important to remember that IAM is additive only. You can't revoke, for example, permissions at a project level that were granted at the folder level.
Cloud IAM is a comprehensive service that provides for fine-grained access controls through the use of roles and policies. Predefined roles are available in Cloud IAM and are designed to group permissions needed for common use cases, such as administration of a database. Custom rules can be created when the predefined roles don't meet your specific needs, especially with respect to the principle of least privilege. Roles can also be used with service accounts to provide authorizations to VMs and applications. Also, access control policies can be used to control access to resources.
Using IAM with Storage and Processing Services
The previous section described IAM in general. No let's take a look at some specific examples of how IAM predefined roles can be used with the following GCP services:
1. Cloud Storage.
2. Cloud Bigtable.
3. BigQuery.
4. Cloud Dataflow
Of course, there are other relevant services, but once you understand these, you should be able generalize to other services as well.
Comments
Post a Comment