AZ-204 - Developer associate - Storage solutions

Last updated Jul 6, 2022 Published Jun 6, 2022

Storage is one of the main concepts to get familiar with for AZ-204 (and also other exams). The 204 is the entry level for the others certifications.

Storage solutions

To follow up the code examples, it is required to have the dotnet sdk installed. For linux, instructions can be found under Microsoft’s official documentation

  • Azure storage account provides storage on the cloud
    • blob: general purposes (images, videos)
      • requires a container
        • container can be private, read access for blobs and read access for containers and blobs
      • files are stored as an object (blob) - I actually related this one to s3
    • table: table data
    • queue: pub/sub
    • file: shared files between vms
  • access keys, shared access signatures and azure active directory
    • azure storage explore (desktop app)
    • access keys
      • SAS (Shared access signatures)
        • can define expiration date
        • limited access to specific services
        • can´t be revoked
      • Store Access Policies
        • can be revoked, fine grained access
      • Active Directory
        • Access control (IAM - Identify Access Management)
      • Access tiers
        • pricing based on the space used
          • hot
            • selected by default when creating storage account
          • cool (coast least)
            • when you create the storage account you set the type (cool or hot)
          • archive
            • it adds a rehydrate step to retrieve the file (it takes time, up til 15h for standard or 1h for high priority)
            • can send single files to the archive
            • to retrieve an object in archive mode, go to edit and change the tier to standard or high

Performance

  • Standard
  • Premium
    • Block blobs
    • File shares
    • Page blobs

Redundancy

  • LRS (Locally Redundant Storage)
  • GRS (Globally Redundant Storage)
  • ZRS (Zone)
  • GZRS (Globally and Zoned)
  • Advanced
  • Networking
  • Data protection
    • soft deletes

Blobs

  • Life cycle rules
    • add rule to all blobs or limit blobs with filters
    • example of rules
      • if the blob has not been modified for one day move it to the cool storage
      • rules takes 24h to be applied
  • Blob versioning
    • blob are immutable - pricing?
    • To enable blob: data protection -> check the “Turn on versioning for blobs” box
  • Blobs snapshots
    • picture of a point in time of a particular blob
    • deleting a blob requires deleting a snapshot
  • Soft delete
    • Data protection -> check Turn on soft delete for blob (based on a number of days)
    • it also recovery blob snapshots

Using blob storage sdk

Note: to follow the code examples Azure.Storage.Blobs package is required.

  • Azure.Storage.Blobs - v12
    • BlobServiceClient class
      • package Azure.Storage.Blobs
      • Work with azure storage resources and blob containers (creates container)
    • BlobContainerClient class
      • package Azure.Storage.Blobs
      • Work with Storage containers and blobs (uploads blob to container sync or async - it also offers a flag to overwrite if the file already exists in the container, list all blobs in a container)
    • BlobClient class
      • Work with Storage blobs (download blob)
    • BlobDownloadInfo class
      • Represents the content returned from a downloaded blob
    • BlobSasBuilder
      • package using Azure.Storage.Sas;
      • enables to set shared access signatures (SAS) programmatically (BlobSasBuilder)
    • Each blob can hold metadata
      • accessing metadata programmatically can be achieved through .getProperties under BlobClient
    • Lease (Exclusive lock)
      • To acquire the lease, the method GetBlobLeaseClient is used to retrieve the lease representation
        • Under the lease representation the method Acquire is called with the time of the lease
        • in the end call the method Release to release the lease - if no Release is called, will Azure release the lock after the time specified?
      • Streams can be used to make change of a file in memory
        • MemoryStream
        • StreamReader
        • StreamWriter
  • ARM templates
    • Automation of storage account process
  • Change feed (streams of the changes in apache avro)
    • General purpose v2 and Blob storage are supported
    • Data protection =-> turn on change feed (it will create a folder in the storage account container named $blobchangefeed)

AzCopy tool

Copy files from one storage account to the other, installation instructions are available under Microsoft’s official documentation.

From local to azure storage account container (Upload)

  • azcopy make - creates a container (it requires a url)
  • azcopy copy FILE_NAME CONTAINER_URL
  • azcopy copy “dir/*” CONTAINER_URL (not recursive, will not include folders)
  • azcopy copy “dir/*” CONTAINER_URL –recursive (will include subdirectories)

From azure storage account container to local (Download)

  • azcopy FILE_AND_CONTAINER_URL my_local_file.txt
  • az copy CONTAINER_URL “.” –recursive (will include subdirectories)

Copy between two storage accounts

  • azcopy copy SOURCE_STORAGE_ACCOUNT_URL DESTINATION_STORAGE_ACCOUNT
  • azcopy copy SOURCE_STORAGE_ACCOUNT_URL DESTINATION_STORAGE_ACCOUNT –recursive (to copy everything in the storage)

Azure CLI tool

File shares (kind of a dropbox)

  • Types
    • Hot - for frequent access
    • Cool - for files that are not consumed frequently - cost is higher
    • Premium - disabled if performance is standard
    • Transactional optimized - general purpose storage
  • it is supported across windows, linux and macos
  • Once the file shares is created it creates a drive on the host to be used and share files
  • Multiple machines can use the same file share
  • Firewall in place usually blocks access

Table storage

  • store data based on key and attributes values
    • non relation structured data
  • partition key
    • divide the logical data into different partition to speed up lookup
  • row key
  • Microsoft.Azure.Cosmos.Table package
    • CloudStorageAccount holds the container for a given account
    • based on the container CloudTableClient class references the table
    • Map EntityTable to a custom entity
    • TableOperation performs the CRUD operation
    • Batch operation is supported through the class TableBatchOperation
    • TableOperation.Retrieve is used to fetch data

Tips: Can I retrieve without partition key?

Storage queue

  • Storage queue service
    • queues are used to decouple applications
    • simples solution compared with service bus
  • To interact with queue in c# package used is Azure.Storage.Queues
    • QueueClient connects to the queue (connection and the queue name)
    • use the method SendMessage to send items to the queue
    • PeekMessage or PeekMessages (Azure.Storage.Queues.Models) in the queue does not remove the message in the queue
    • ReceiveMessage returns a QueueMessage, it requires a manual delete with DeleteMessage method

It is possible to create a azure function to trigger a queue.

  • Azure functions requires the message to be base64 encoded (to read and push)
  • Azure function will try 5 times, if no success it will create a queue-poison and store the messages there
  • The package used for functions and queues is Microsoft.Azure.WebJobs

It is possible to use queues and store the information in the table via azure functions

  • Azure service bus

Azure Cosmos DB

  • Fully managed noSQl database
    • there are no fks or relationship in cosmos, there is the concept of embedded data instead( like nested objects)
  • High available (see availability options below)
  • API’s available for cosmos are SQL API, Table API, MongoDB API, Gremlin API and Cassandra API
    • You can chose which one to consume when creating the cosmos db instance
  • Capacity mode
    • charged by the storage
    • charged by request units
    • 400 RU and 5GB of storage is offer for free
  • packages to interact with cosmos db is Microsoft.Azure.Cosmos
    • to connect with cosmosdb the string connection is under keys
    • CosmosClient is used to connect to the database
    • Cosmos db provides a change feed (when updating a document)
  • Cosmos db supports stored procedures and triggers
  • Composite indexes
    • are required when ordering data by two fields otherwise an error will rise
    • to add a composite index
      1. Under container
      2. Settings
      3. Indexing Policy
    • Time to live (TTL)
  • Cosmos db table uses partitions to enable efficient queries

Full API microsoft documentation to create cosmos db instance from azure cli is also a good thing to have in mind

Consistency

Cosmos DB offers different levels of consistency, named:

  • Strong (Always in sync across databases, increases latency)
  • Bounded Staleness (Replicates data async, readers within the same region wil see strong consistency)
  • Session (Client centered, client sharing the same session)
  • Prefix (Readers will never see out of orders writes)
  • Eventual (No order guarantee for reads)

The way microsoft documentation depicts the levels of consistency are as follows:


| | | Strong — Bounded Staleness — Session —- Consistent Prefix - Eventual | | | | High availability ——————————————> Higher throughput | |_______________________________|

CLI with cosmos

  • az cosmosdb create

Data factory

  • ETL (Extract Transform Load) tool used to
  • Resources -> data factory
  • container -> upload csv file
    • data factory
    • source (can even be s3)
    • destination

Blob trigger azure function output to cosmos db

  • package for that: Microsoft.Azure.WebJobs.Extensions.CosmosDB (blob trigger)