AZ-204 - Developer associate - Storage solutions

Last updated Jul 29, 2022 Published Jun 6, 2022

Storage is one of the main concepts to get familiar with for AZ-204 (and also other exams). In this section we will go over different aspects of the storage account in microsoft and its services, more specifically: access keys, azcopy tool, blobs, redundancy and cosmosdb.

Storage solutions

To follow up the code examples, it is required to have the dotnet sdk installed. For linux, instructions can be found under Microsoft’s official documentation

  • Azure storage account provides storage on the cloud
    • blob: general purposes (images, videos)
      • requires a container
        • container can be private, read access for blobs and read access for containers and blobs
      • files are stored as an object (blob) - I actually related this one to s3
    • table: table data
    • queue: pub/sub
    • file: shared files between vms

access keys, shared access signatures and azure active directory

  • azure storage explore (desktop app)
  • access keys
    • SAS (Shared access signatures)
      • can define expiration date
      • limited access to specific services
      • can´t be revoked
    • Store Access Policies
      • can be revoked, fine grained access
    • Active Directory
      • Access control (IAM - Identify Access Management)

Revoke SAS

  • Revoke the delegation keys
  • Remove the role assignment for the security principle

Access tiers

hot

  • selected by default when creating storage account

cool (coast least)

  • when you create the storage account you set the type (cool or hot)

archive

  • it adds a rehydrate step to retrieve the file (it takes time, up to 15h for standard or 1h for high priority)
  • can send single files to the archive
  • to retrieve an object in archive mode, go to edit and change the tier to standard or high

Storage account v1 does not support event grid

Performance

  • Standard
  • Premium
    • Block blobs
    • File shares
    • Page blobs

Redundancy

  • LRS (Locally Redundant Storage)
  • GRS (Globally Redundant Storage)
  • ZRS (Zone)
  • GZRS (Globally and Zoned)

Data protection

soft deletes

Resources marked as deleted (with soft delete) are retained for a specified period (90 days by default). The service further provides a mechanism for recovering the deleted object, essentially undoing the deletion.

refs:

Blobs

  • Life cycle rules
    • add rule to all blobs or limit blobs with filters
    • example of rules
      • if the blob has not been modified for one day move it to the cool storage
      • rules takes 24h to be applied
  • Blob versioning
    • blob are immutable - pricing?
    • To enable blob: data protection -> check the “Turn on versioning for blobs” box
  • Blobs snapshots
    • picture of a point in time of a particular blob
    • deleting a blob requires deleting a snapshot
  • Soft delete
    • Data protection -> check Turn on soft delete for blob (based on a number of days)
    • it also recovery blob snapshots

Using blob storage sdk

Note: to follow the code examples Azure.Storage.Blobs package is required.

The exam refers to the version 11/12, but in the mock exams version 12 and 13 can be found.

  • Azure.Storage.Blobs - v12
    • BlobServiceClient class
      • package Azure.Storage.Blobs
      • Work with azure storage resources and blob containers (creates container)
    • BlobContainerClient class
      • package Azure.Storage.Blobs
      • Work with Storage containers and blobs (uploads blob to container sync or async - it also offers a flag to overwrite if the file already exists in the container, list all blobs in a container)
    • BlobClient class
      • Work with Storage blobs (download blob)
    • BlobDownloadInfo class
      • Represents the content returned from a downloaded blob
    • BlobSasBuilder
      • package using Azure.Storage.Sas;
      • enables to set shared access signatures (SAS) programmatically (BlobSasBuilder)
    • Each blob can hold metadata
      • accessing metadata programmatically can be achieved through .getProperties under BlobClient
    • Lease (Exclusive lock)
      • To acquire the lease, the method GetBlobLeaseClient is used to retrieve the lease representation
        • Under the lease representation the method Acquire is called with the time of the lease
        • in the end call the method Release to release the lease - if no Release is called, will Azure release the lock after the time specified?
      • Streams can be used to make change of a file in memory
        • MemoryStream
        • StreamReader
        • StreamWriter
  • ARM templates
    • Automation of storage account process
  • Change feed (streams of the changes in apache avro)
    • General purpose v2 and Blob storage are supported
    • Data protection =-> turn on change feed (it will create a folder in the storage account container named $blobchangefeed)

Authorization for blob or queue

The authentication via api to consume storage account services are based on permissions that are defined when a new application is registered (when authenticating via azure ad credentials).

ARM template

ARM template to deploy three storage accounts:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "storageCount": {
      "type": "int",
      "defaultValue": 3
    }
  },
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts",
      "apiVersion": "2019-04-01",
      "name": "[concat(copyIndex(),'storage', uniqueString(resourceGroup().id))]",
      "location": "[resourceGroup().location]",
      "sku": {
        "name": "Standard_LRS"
      },
      "kind": "Storage",
      "properties": {},
      "copy": {
        "name": "storagecopy",
        "count": "[parameters('storageCount')]"
      }
    }
  ]
}

AzCopy tool

Copy files from one storage account to the other, installation instructions are available under Microsoft’s official documentation.

From local to azure storage account container (Upload)

  • azcopy make - creates a container (it requires a url)
  • azcopy copy FILE_NAME CONTAINER_URL
  • azcopy copy “dir/*” CONTAINER_URL (not recursive, will not include folders)
  • azcopy copy “dir/*” CONTAINER_URL –recursive (will include subdirectories) - the parameter –recursive=true is also accepted

From azure storage account container to local (Download)

  • azcopy FILE_AND_CONTAINER_URL my_local_file.txt
  • az copy CONTAINER_URL “.” –recursive (will include subdirectories)

Copy between two storage accounts

  • azcopy copy SOURCE_STORAGE_ACCOUNT_URL DESTINATION_STORAGE_ACCOUNT
  • azcopy copy SOURCE_STORAGE_ACCOUNT_URL DESTINATION_STORAGE_ACCOUNT –recursive (to copy everything in the storage)

From azure account to another azure container

azcopy sync FROM_STORAGE TARGET_STORAGE

Azure CLI tool

  • az storage copy blob
  • az storage blob delete
  • az storage blob download
  • az storage blob sync
  • az storage blob upload
  • az storage blob run-command <—- uses SAS

Official documentation for azcopy in the azure cli can be found here.

File shares (kind of a dropbox)

  • Types
    • Hot - for frequent access
    • Cool - for files that are not consumed frequently - cost is higher
    • Premium - disabled if performance is standard
    • Transactional optimized - general purpose storage
  • it is supported across windows, linux and macos
  • Once the file shares is created it creates a drive on the host to be used and share files
  • Multiple machines can use the same file share
  • Firewall in place usually blocks access

Table storage

  • store data based on key and attributes values
    • non relation structured data
  • partition key
    • divide the logical data into different partition to speed up lookup
  • row key
  • Microsoft.Azure.Cosmos.Table package
    • CloudStorageAccount holds the container for a given account
    • based on the container CloudTableClient class references the table
    • Map EntityTable to a custom entity
    • TableOperation performs the CRUD operation
    • Batch operation is supported through the class TableBatchOperation
    • TableOperation.Retrieve is used to fetch data

Tips: Can I retrieve without partition key?

Storage queue

  • Storage queue service
    • queues are used to decouple applications
    • simples solution compared with service bus
  • To interact with queue in c# package used is Azure.Storage.Queues
    • QueueClient connects to the queue (connection and the queue name)
    • use the method SendMessage to send items to the queue
    • PeekMessage or PeekMessages (Azure.Storage.Queues.Models) in the queue does not remove the message in the queue
    • ReceiveMessage returns a QueueMessage, it requires a manual delete with DeleteMessage method

It is possible to create a azure function to trigger a queue.

  • Azure functions requires the message to be base64 encoded (to read and push)
  • Azure function will try 5 times, if no success it will create a queue-poison and store the messages there
  • The package used for functions and queues is Microsoft.Azure.WebJobs

It is possible to use queues and store the information in the table via azure functions

  • Azure service bus

Azure Cosmos DB

  • Fully managed noSQl database
    • there are no fks or relationship in cosmos, there is the concept of embedded data instead( like nested objects)
  • High available (see availability options below)
  • API’s available for cosmos are SQL API, Table API, MongoDB API, Gremlin API and Cassandra API
    • You can chose which one to consume when creating the cosmos db instance
  • Capacity mode
    • charged by the storage
    • charged by request units
    • 400 RU and 5GB of storage is offer for free
  • packages to interact with cosmos db is Microsoft.Azure.Cosmos
    • to connect with cosmosdb the string connection is under keys
    • CosmosClient is used to connect to the database
    • Cosmos db provides a change feed (when updating a document)
  • Cosmos db supports stored procedures and triggers
  • Composite indexes
    • are required when ordering data by two fields otherwise an error will rise
    • to add a composite index
      1. Under container
      2. Settings
      3. Indexing Policy
    • Time to live (TTL)
  • Cosmos db table uses partitions to enable efficient queries
  • Change feed design patterns in Azure Cosmos DB

Full API microsoft documentation to create cosmos db instance from azure cli is also a good thing to have in mind

Consistency

Cosmos DB offers different levels of consistency, named:

  • Strong (Always in sync across databases, increases latency)
  • Bounded Staleness (Replicates data async, readers within the same region will see strong consistency)
    • Allow out of order data with a maximum of 5 seconds tolerance window
    • Data can be stale by at most 2 versions
  • Session (Client centered, client sharing the same session)
  • Prefix (Readers will never see out of orders writes)
    • Never see out of order writes
  • Eventual (No order guarantee for reads)
    • Read out of order writes

The way microsoft documentation depicts the levels of consistency are as follows:

------------------------------------------------------------------------------------------------------------------------
|                                                                                                                      |
|                    Strong  --- Bounded Staleness --- Session ---- Consistent Prefix - Eventual                       |
|                                                                                                                      |
|                  High availability ------------------------------------------> Higher throughput                     |
|______________________________________________________________________________________________________________________|

Partition key

Partition keys are used to spread evenly the workload in cosmos db, as such, there are some cases in which the partition key is not well defined. For example, if you have hundreds of values but there are no distinct values among them, making this split becomes difficult.

The alternatives to that end are:

  • concatenation of multiple property values with a radom suffix (known as synthetic keys - apparently this applies to the SQL API)
  • Using a hash suffix that is appended to a property value

Triggers

Cosmos db offers triggers to be executed in enums based on different events, named:

  • Delete
  • Update
  • Create
  • Replace
  • All

CLI with cosmos

  • az cosmosdb create
resourceGroup=my-group
accountName=my-name
databaseName=my-db

consistencyLevel=strong

az cosmosdb create --name $accountName \
  --resource-group $resourceGroup \
  --max-interval 5 \
  --enable-automatic-failover true
  --default-consistency-level=$consistencyLevel \
  --locations regionName=southcentralus failoverPriority=0 isZoneRedundant=False \
  --locations regionName=northcentralus failoverPriority=1 isZoneRedundant=True

refs az cosmosdb create

CosmosDB RBAC

Data factory

  • ETL (Extract Transform Load) tool used to handle data
  • Resources -> data factory
  • container -> upload csv file
    • data factory
    • source (can even be s3)
    • destination

Blob trigger azure function output to cosmos db

  • package for that: Microsoft.Azure.WebJobs.Extensions.CosmosDB (blob trigger)