Like any other database, finding table size and other important information is part of day to day activities of a database administrator or developer. Especially you can use this information to find large unsorted table’s to safe storage space and optimize Redshift Cluster.
You can use following query to find large unsorted tables with more than 50% unsorted rows and percentage used by these tables. These tables are good target for vacuum or deep table copy.
SELECT "schema","table", size, tbl_rows,unsorted,pct_used FROM SVV_TABLE_INFO where unsorted > 50 order by 3 desc
By default if you enable audit logging to S3 for AWS Redshift clusters , it will not log user activities. Its because user_activity_logging parameter is disable at cluster level. You need to create custom parameter group from workload management tab and set enable_user_activity_logging to True.
You can user following steps to set enable_user_activity_logging parameter to true
Step 1 : Create custom parameter group
Step 2 : Set enable_user_activity_logging to true.
In addition to querying Redshift system tables for user
activities, you also have an option to write audit logs to S3. This option is especially
helpful if you are looking to keep history of user activities for more than
just few days. As Redshift default system tables will only keep data for last 3
-5 days in rolling manner.
To enable audit logging to S3 Buckets, you need complete
Step: 1 Enable Audit logging from Console.
Step 2 : Provide S3 bucket information
Note :- S3 Prefix is optional
Step 2: Change Bucket
Bucket Policy for Redshift Audit Logs: –
“Sid”: “Put bucket policy needed for audit logging”,
“Sid”: “Get bucket policy needed for audit logging “,
Note : – This is just for POC , Please use more restrictive access for you production systems.
Step 3: Review S3 buckets folder log files.
Note : – Statements are logged as soon as Amazon Redshift
receives them. Files on Amazon S3 are updated in batch, and can take a few
hours to appear.
As compare to many cloud warehouse solutions , Redhift
provides one the of the best built-in security options. Redshift by default
logs connections and user acticities in your database which can help you audit
user acitivies in your database as it may be required for complaince reasons.
Amazon Redshift logs information in the following log files:
Connection log —
logs authentication attempts, and connections and disconnections.
User log — logs
information about changes to database user definitions.
User activity log —
logs each query before it is run on the database.
You can query following tables to view about information
Note : – To manage disk space, the STL log tables
only retain approximately two to five days of log history, depending on log
usage and available disk space. If you want to retain the log data, you will
need to periodically copy it to other tables or unload it to Amazon S3.
Many of Amazon Redshift Customers
are wondering if they can Stop / Start their Redshift Cluster as many of other
AWS instances and services to reduce incurring cost. After all idea here is
that with cloud you pay for only what you use. And it true for many AWS
services like RDS instances and even Snowflake
( Redshift competitor).
When it comes to Redshift clusters,
it’s not straight forward to simply stop and start clusters. It’s not because Amazon
is deliberately trying to make it hard for customer to stop/start their clusters,
basically Redshift architecture simply won’t allow this functionally without taking
Basically, Redshift uses local storages
on compute nodes to store data in slices and if shut them down, you will not
release compute/storage for other cloud customers. Hence AWS cannot provide
this functionally without incurring any charges. Amazon is releasing new
features for Redshift on Regular bases and you might see this behavior change
You can still shutdown Redshift instances
to reduce incurring charges if you are using ON-Demand clusters by using
following steps. Basically, you will need to take a snapshot and delete the
cluster then restore you cluster from snapshot when you need it back online.
1. From AWS Console, select the cluster you want to stop
2. Drop down the “Cluster” menu and select “Delete”
3. Enter the Snapshot name; I use the same name as the
cluster I’m deleting
1. In the Redshift Snapshots
select the snapshot of the cluster you want to restore
2. Drop down the “Actions” menu,
and select “Restore From Snapshot”
3. Complete the configuration
details, and click the “Restore” button at the bottom right
You organization might not be like the idea of delete a production cluster with sensitive data just to safe cost.
If you looking to reduce cost with Redshift, it best to opt of reserve nodes as they are offered on significantly lower cost vs. on-demand cluster nodes.
It might take 30 mins to an hour to stop/start Redshift Cluster using snapshots for terabyte database.
Snowflake might be a better option for customers with on-demand workload , because you can easily shutdown and start Snowflake clusters in seconds without deleting any data as it using S3 buckets to store data.
Many organizations are looking into migrating their applications to Oracle cloud and with time their numbers are only going to be increased. There are many ways to migrate your application to Oracle cloud, but they all have one common important phase called application discovery. Application discovery is a process where you collect in-depth information about your source environment for a purpose of Cloud migration. There is a lot already published about public cloud migrations, but very little information is available about application discovery process. Cloud application discovery process is one of the most important initial phases for any cloud migration. This phase will not only help you see if your application is a good fit for public cloud, it will also help you reduce road blocks and technical issues during the actual migration. If you have a complete detailed discovery of your application before the migration, it will also help you plan migration project and reduce post migration issues.
General Application Questions: –
Here are some sample questions that
you can ask during your initial contact with application team.
is the name & version of the application?
are the IT owners of target application?
type of database and version used by this application?
is the technical subject matter expert for this application?
this a vendor or in-house build application?
this application use or mange compliance data like PCI or HIPPA?
you provide us any architectural documents for this application?
this application have a DR or load balance solution built-in?
there a middleware component to this application (WebLogic or WebSphere)?
are the business users and owner of this application?
It’s important to provide
background to the target audience like explaining reason for this contact. This
initial contact can lead to very important feedback like “This application is
already retired” or “This application can never move to cloud for ABC reasons”.
Ideally, you will use collected information and architectural documents to get
ready for the deep dive into the upcoming discovery steps.
Detail Application Questions:
Here are examples of some details that you can gather for
each section of your detail application discovery phase:
Is there a middle component of application? If yes, detail.
Is there a DR for this application? If yes, detail.
Is there a Load balance component to this application? If yes, detail.
Is there a dependency with other applications?
If yes, detail.
Is this an internal or customer facing
Is this application part of group maintenance
window? If yes, detail.
Is this application sensitive to network
bandwidth? Large data transfer?
What are Firewall rules needed for this
there any large amount of data transfer happening to or from the application?
What type of database/version is used by the application?
What is the size and growth rate of application database?
What is the RPO/RTO for the database?
Is this a vendor application with support
contract in-place? If yes, detail
Who is supporting this application? what are
Is this application certified to run from Oracle
How much total storage is used by target
What type of storage used by target application?
NFS, Flash, SAN
Is IOP/s required by this application or
How many hosts use this application and what are
What are the OS types used by target
Can we upgrade or change underlying operating
system if not available in Oracle Cloud?
Risks / Issues:
Is there a compliance requirement for this
Do you need to re-architect or make code changes
for Cloud migration?
Is there a risk to migrate application from
physical to virtual hosts?
I would like to use this blog to point out some key differentiators for Oracle Big Data Cloud offering. Oracle’s Big Data Cloud is designed for the enterprises using open source technologies like Apache Spark and Apache Hadoop. It utilizes Oracle’s Infrastructure Cloud Services to deliver a managed, secure, elastic and integrated platform for all Big Data Workloads.
Oracle Big Data Cloud offering is
designed with multiple open source components, and this makes it an ideal big
data platform for companies who are looking to adopt open source technologies.
This offering provides the customer a choice of opening source Big Data
technology with unlimited scalability and highly available platform. Oracle Big
Data Cloud is integrated with Oracle’s IaaS and PaaS solutions, giving you an
opportunity to spend your resources and time developing new technologies versus
wasting time setting up the environment to analyze the data.
Oracle Big Data Cloud offers a simple but robust solution to
Big Data customers. You can spin up a new Apache Hadoop or Apache Spark Cluster
or an entire Big Data Analytics stack in minutes. Similarly, you can scale up
or down manually or automatically based on KPI based policies. You can also
launch as many clusters as you need to analyze the data. Oracle Big data cloud
also offers Centralized Identity and Access management through Identity Cloud
Service. Finally, Oracle Big Data Cloud support elasticity at three levels of
solution, for example:
• Cluster Elasticity: Scale out/in, up/ down based on
• Storage Elasticity: Scale the storage tier independently
from the compute tier
• Compute Elasticity: Choose from a variety of compute
shapes depending on the workload
As per Oracle, Oracle Big Data Cloud is
integrated within the Oracle Identity Cloud Service to provide centralized way
of enabling user-access to the service and robust auditing of user access.
Kerberized clusters are integrated with Oracle’s central Identity Store to
provide strong authentication to all the cluster’s services. Oracle Big Data
Cloud protects data-in-motion and data-at-rest through encryption. All service
lifecycle as well as service consumption REST calls to the service are also
protected through HTTPS. Software-defined Networking enables the customers of
Oracle Big Data Cloud to have fine grained control over the network security.
Customers are able to define VPN, control which ports get exposed as well as
white-list IPs through self-service.
Hardware & network profiling is
another important phase of application discovery process regardless of end goal,
which can be used for public cloud migration or data center migration. It
includes capturing statistics about hardware usage like CPU utilization, memory
usage, network bandwidth, data transfer and IOPS. Capturing hardware statistics are important
since it will be used for capacity planning and provisioning hardware resources
to Oracle public cloud for target application. Any miscalculation on this part
can lead to under provisioning of hardware resources in Oracle Public cloud
which can cause the application to perform poorly after migration.
Hardware statistics can be
gathered using many methods depending on the operating systems and existing
monitoring tools. Many organizations have already deployed robust monitoring
and alerting tools like Oracle Enterprise Manager, OS watcher to monitor their
systems. You should be able to gather most of the required information from existing
reports and logs. Additionally, you can also install user define shell or
Oracle PL/SQL scripts to gather rest of the information. One also should not
ignore the importance of firewall, gathering network traffic and firewall
information as they are crucial in developing a secured network in Oracle cloud
Application profiling is an important phase for Oracle cloud migration , it’s design to gather in-depth information about the target application. Based on the information that is collected during the initial contacts, you can initiate application profiling phase. Information gathered during this phase can be used for migration planning and provisioning hardware resources for Oracle public cloud. This phase will also help you determine if all the application components are available in Oracle public cloud like load balancer, remote DR solution and storage replication. Depending on the application type, you might need to dig into the application code to see if that can be fork lifted to Oracle public cloud without making any changes, as sometimes application code contains hostnames and IP addresses. Similarly, understanding application logic and data set can be equality import for upcoming migration. If target application ingests or provides huge amount of data to other applications in the organization, this can present as a challenge for migration because of network bandwidth and transfer charges. Additionally, if this application contains PCI or HIPPA compliance related data, this will require more scrutiny and approvals from compliance department.
It’s also important to gather all
test scripts for target application. Review test scripts to determine if they
can be used during the migration for testing application performance and
functionality. If test scripts are not available or are incomplete, this should
be marked as a risk for migration to Oracle cloud. Finally, capture application
performance statistics for all critical processes and queries so they can be
used as baseline for upcoming migration.
There are many profiling tools
available in the market now a days but here is the list some Oracle profiling
tools you can use for application and data profiling based on your needs.
Many organizations are looking into
migrating their applications to Oracle cloud and with time their numbers are
only going to be increased. There are many ways to migrate your application to Oracle
cloud, but they all have one common important phase called application
discovery. Application discovery is a process where you collect in-depth
information about your source environment for a purpose of Cloud migration. There
is a lot already published about public cloud migrations but very little
information is available about application discovery process. Cloud application
discovery process is one of the most important initial phases for any cloud migration.
This phase will not only help you see if your application is a good fit for public
cloud, it will also help you reduce road blocks and technical issues during the
actual migration. If you have a complete detailed discovery of your application
before the migration, it will also help you plan migration project and reduce
post migration issues.
As mentioned earlier, application
discovery process involves gathering detailed information about the target
application. This may include conducting interviews with application
users & developers, application & data profiling and hardware &
network profiling. Application discovery process usually start with
interviewing technical and business users to help them understand application functionality,
logic and flow of the target application. You can also start application
profiling while you are conducting interviews, this process will involve using
many techniques and tools to gather application components, dependencies, security,
data and compliance information about the target environment. Similarly, you
will need to profile existing hardware and network to map out target
environments to services and resources available in Oracle public cloud. It’s
important to note that Oracle Public cloud is enterprise cloud offering many
application components as a platform for services like Load Balancer, DBaaS,
GoldenGate, Business intelligence etc. I will recommend that you analyze and
compare all your application components to see if you can replace any of your
application component with Oracle cloud service to decrease the cost and
It is also important to
understand that there are few application discovery tools offered by cloud
vendors which can facilitate application discovery process. But we found that
those tools were not helpful in many situations since very few customers will
let you connect to their production systems using those tools. Keeping that in
mind, this article is mostly focused on manual process and steps to complete
application discovery process while referencing many Oracle and third-party
tools as needed. Additionally, application discovery process is almost the same
for any public cloud migration, but this article will focus on Oracle public