From patchwork Tue Aug 29 04:31:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Feng X-Patchwork-Id: 1827082 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=XDSVev2M; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RZZMg2zvKz1yfX for ; Tue, 29 Aug 2023 14:34:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 309C73857721 for ; Tue, 29 Aug 2023 04:34:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 309C73857721 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693283676; bh=yB6a+Vtqeb3RUqvWL8l2XKInup/SyJlD6bmKuXCTNS8=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=XDSVev2MkU0lcrkBWd7GGVGV33dFrTVwFp6qSq8NxZWbZuLxGfnSU9XAgrlOtouW5 FiZxMxWX3Wk9rhPiEa8a2bplyXlEtVPGqwWZmKb4+aysHmxP/t6KalBeEDP0tB9cRP Nij/ejLA/jVbI1Qf8LCRGvzVVNcaYzmV2Er5LxZc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by sourceware.org (Postfix) with ESMTPS id 331DF3858D33 for ; Tue, 29 Aug 2023 04:32:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 331DF3858D33 Received: from pps.filterd (m0167077.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37T3xAoq028107 for ; Tue, 29 Aug 2023 00:32:04 -0400 Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3sqayvhhu5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 29 Aug 2023 00:32:04 -0400 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-64a5f9a165aso48098386d6.1 for ; Mon, 28 Aug 2023 21:32:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693283523; x=1693888323; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yB6a+Vtqeb3RUqvWL8l2XKInup/SyJlD6bmKuXCTNS8=; b=UC+EnYZ8z90REweVTXaLDO035GHnP5diFldbWL1bOSDeH9sJMzUhsFFzfKt54mn7wd uwXuIEYZTJ/PQ6ZFiOcxaRU3KqrhzLKQ1wCnpmKhqRlhGZbClbx6t9bJDTlAMjRY2elA P8ztff5+bF4Zrxxa7EHzKZW3ft7h3jv3duUYBzwySLSm/2Uq3+FSxec3Buqbzfow0W4k u3H8pjcCh5RRew8LZ8ncjVKiPhMmGfMsEkt1sXSgf6napMdtX9YS2b+6lU7tsCQm34ie 9BGN0+WPjpSRypG7RUg3iIk1hiECOaTaa/hGLrERDnJQLHD72b+SR08oAczkJXo3Glic NizA== X-Gm-Message-State: AOJu0YyfFmraoDdUO7vAXZYbnZBYHNqX/hq/u5fTGsT/WqRUZBHWNJFV E5OU55bfxEb2wvwyv16uA7hHzAV1HyiGgHA92qbwTcJpdO4VlNHMJ20Gr68/y6zpDJQns4IHLNf o6BVTlxlqkcTx+ec= X-Received: by 2002:a0c:aacc:0:b0:647:8e35:623b with SMTP id g12-20020a0caacc000000b006478e35623bmr25814051qvb.15.1693283523227; Mon, 28 Aug 2023 21:32:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEiMva+JamhAVmhxZPqXyUuOfTXh91XKNuxBILm2RF2t6xnRNjM974brf5R3SraSibDZ+CBiA== X-Received: by 2002:a0c:aacc:0:b0:647:8e35:623b with SMTP id g12-20020a0caacc000000b006478e35623bmr25814034qvb.15.1693283522579; Mon, 28 Aug 2023 21:32:02 -0700 (PDT) Received: from Ericcs-MBP.cable.rcn.com (207-38-164-63.s9254.c3-0.43d-cbr1.qens-43d.ny.cable.rcncustomer.com. [207.38.164.63]) by smtp.gmail.com with ESMTPSA id d20-20020a0caa14000000b0064f378f89a7sm3095114qvb.73.2023.08.28.21.32.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 28 Aug 2023 21:32:02 -0700 (PDT) To: dmalcolm@redhat.com Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng Subject: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Date: Tue, 29 Aug 2023 00:31:55 -0400 Message-Id: <20230829043155.17651-1-ef2648@columbia.edu> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: References: MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: jThrvI271frMW2dRQUnERfIGHU4ND0BY X-Proofpoint-GUID: jThrvI271frMW2dRQUnERfIGHU4ND0BY X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-08-29_01,2023-08-28_04,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=10 lowpriorityscore=10 suspectscore=0 mlxscore=0 priorityscore=1501 malwarescore=0 spamscore=0 mlxlogscore=999 clxscore=1015 bulkscore=10 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2308290039 X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Eric Feng via Gcc-patches From: Eric Feng Reply-To: Eric Feng Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi Dave, Thanks for the feedback. I've addressed the changes you mentioned in addition to adding more test cases. I've also taken this chance to split the test files according to known function subclasses, as you previously suggested. Since there were also some changes to the core analyzer, I've done a bootstrap and regtested the patch as well. Does it look OK for trunk? Best, Eric --- This patch introduces initial support for reference count checking of PyObjects in relation to the Python/C API for the CPython plugin. Additionally, the core analyzer underwent several modifications to accommodate this feature. These include: - Introducing support for callbacks at the end of region_model::pop_frame. This is our current point of validation for the reference count of PyObjects. - An added optional custom stmt_finder parameter to region_model_context::warn. This aids in emitting a diagnostic concerning the reference count, especially when the stmt_finder is NULL, which is currently the case during region_model::pop_frame. The current diagnostic we emit relating to the reference count appears as follows: rc3.c:23:10: warning: expected to have reference count: ‘1’ but ob_refcnt field is: ‘2’ 23 | return list; | ^~~~ ‘create_py_object’: events 1-4 | | 4 | PyObject* item = PyLong_FromLong(3); | | ^~~~~~~~~~~~~~~~~~ | | | | | (1) when ‘PyLong_FromLong’ succeeds | 5 | PyObject* list = PyList_New(1); | | ~~~~~~~~~~~~~ | | | | | (2) when ‘PyList_New’ succeeds |...... | 14 | PyList_Append(list, item); | | ~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (3) when ‘PyList_Append’ succeeds, moving buffer |...... | 23 | return list; | | ~~~~ | | | | | (4) here | This is a WIP in several ways: - Enhancing the diagnostic for better clarity. For instance, users should expect to see the variable name 'item' instead of the placeholder in the diagnostic above. - Currently, functions returning PyObject * are assumed to always produce a new reference. - The validation of reference count is only for PyObjects created within a function body. Verifying reference counts for PyObjects passed as parameters is not supported in this patch. gcc/analyzer/ChangeLog: PR analyzer/107646 * engine.cc (impl_region_model_context::warn): New optional parameter. * exploded-graph.h (class impl_region_model_context): Likewise. * region-model.cc (region_model::pop_frame): New callback feature for * region_model::pop_frame. * region-model.h (struct append_regions_cb_data): Likewise. (class region_model): Likewise. (class region_model_context): New optional parameter. (class region_model_context_decorator): Likewise. gcc/testsuite/ChangeLog: PR analyzer/107646 * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count * checking for PyObjects. * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to... * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and * added more tests). * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to... * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added * more tests). * gcc.dg/plugin/plugin.exp: New tests. * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test. * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test. * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test. Signed-off-by: Eric Feng --- gcc/analyzer/engine.cc | 8 +- gcc/analyzer/exploded-graph.h | 4 +- gcc/analyzer/region-model.cc | 3 + gcc/analyzer/region-model.h | 48 ++- .../gcc.dg/plugin/analyzer_cpython_plugin.c | 376 +++++++++++++++++- ....c => cpython-plugin-test-PyList_Append.c} | 56 +-- .../plugin/cpython-plugin-test-PyList_New.c | 38 ++ .../cpython-plugin-test-PyLong_FromLong.c | 38 ++ ...st-1.c => cpython-plugin-test-no-plugin.c} | 0 .../cpython-plugin-test-refcnt-checking.c | 78 ++++ gcc/testsuite/gcc.dg/plugin/plugin.exp | 5 +- 11 files changed, 612 insertions(+), 42 deletions(-) rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => cpython-plugin-test-PyList_Append.c} (64%) create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => cpython-plugin-test-no-plugin.c} (100%) create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index a1908cdb364..736a41ecdaf 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -115,10 +115,12 @@ impl_region_model_context (program_state *state, } bool -impl_region_model_context::warn (std::unique_ptr d) +impl_region_model_context::warn (std::unique_ptr d, + const stmt_finder *custom_finder) { LOG_FUNC (get_logger ()); - if (m_stmt == NULL && m_stmt_finder == NULL) + auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder; + if (m_stmt == NULL && curr_stmt_finder == NULL) { if (get_logger ()) get_logger ()->log ("rejecting diagnostic: no stmt"); @@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr d) bool terminate_path = d->terminate_path_p (); if (m_eg->get_diagnostic_manager ().add_diagnostic (m_enode_for_diag, m_enode_for_diag->get_supernode (), - m_stmt, m_stmt_finder, std::move (d))) + m_stmt, curr_stmt_finder, std::move (d))) { if (m_path_ctxt && terminate_path diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h index 5a7ab645bfe..6e9a5ef58c7 100644 --- a/gcc/analyzer/exploded-graph.h +++ b/gcc/analyzer/exploded-graph.h @@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context uncertainty_t *uncertainty, logger *logger = NULL); - bool warn (std::unique_ptr d) final override; + bool warn (std::unique_ptr d, + const stmt_finder *custom_finder = NULL) final override; void add_note (std::unique_ptr pn) final override; void add_event (std::unique_ptr event) final override; void on_svalue_leak (const svalue *) override; @@ -107,6 +108,7 @@ class impl_region_model_context : public region_model_context std::unique_ptr *out_sm_context) override; const gimple *get_stmt () const override { return m_stmt; } + const exploded_graph *get_eg () const override { return m_eg; } exploded_graph *m_eg; log_user m_logger; diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 4f31a6dcf0f..eb4f976b83a 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3. If not see namespace ana { +auto_vec region_model::pop_frame_callbacks; + /* Dump T to PP in language-independent form, for debugging/logging/dumping purposes. */ @@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue, } unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK); + notify_on_pop_frame (this, retval, ctxt); } /* Get the number of frames in this region_model's stack. */ diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h index 10b2a59e787..440ea6d828d 100644 --- a/gcc/analyzer/region-model.h +++ b/gcc/analyzer/region-model.h @@ -236,6 +236,10 @@ public: struct append_regions_cb_data; +typedef void (*pop_frame_callback) (const region_model *model, + const svalue *retval, + region_model_context *ctxt); + /* A region_model encapsulates a representation of the state of memory, with a tree of regions, along with their associated values. The representation is graph-like because values can be pointers to @@ -532,6 +536,20 @@ class region_model get_builtin_kf (const gcall *call, region_model_context *ctxt = NULL) const; + static void + register_pop_frame_callback (const pop_frame_callback &callback) + { + pop_frame_callbacks.safe_push (callback); + } + + static void + notify_on_pop_frame (const region_model *model, const svalue *retval, + region_model_context *ctxt) + { + for (auto &callback : pop_frame_callbacks) + callback (model, retval, ctxt); + } + private: const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const; const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const; @@ -621,6 +639,7 @@ private: tree callee_fndecl, region_model_context *ctxt) const; + static auto_vec pop_frame_callbacks; /* Storing this here to avoid passing it around everywhere. */ region_model_manager *const m_mgr; @@ -649,8 +668,10 @@ class region_model_context { public: /* Hook for clients to store pending diagnostics. - Return true if the diagnostic was stored, or false if it was deleted. */ - virtual bool warn (std::unique_ptr d) = 0; + Return true if the diagnostic was stored, or false if it was deleted. + Optionally provide a custom stmt_finder. */ + virtual bool warn (std::unique_ptr d, + const stmt_finder *custom_finder = NULL) = 0; /* Hook for clients to add a note to the last previously stored pending diagnostic. */ @@ -757,6 +778,8 @@ class region_model_context /* Get the current statement, if any. */ virtual const gimple *get_stmt () const = 0; + + virtual const exploded_graph *get_eg () const = 0; }; /* A "do nothing" subclass of region_model_context. */ @@ -764,7 +787,8 @@ class region_model_context class noop_region_model_context : public region_model_context { public: - bool warn (std::unique_ptr) override { return false; } + bool warn (std::unique_ptr d, + const stmt_finder *custom_finder) override { return false; } void add_note (std::unique_ptr) override; void add_event (std::unique_ptr) override; void on_svalue_leak (const svalue *) override {} @@ -812,6 +836,7 @@ public: } const gimple *get_stmt () const override { return NULL; } + const exploded_graph *get_eg () const override { return NULL; } }; /* A subclass of region_model_context for determining if operations fail @@ -840,7 +865,8 @@ private: class region_model_context_decorator : public region_model_context { public: - bool warn (std::unique_ptr d) override + bool warn (std::unique_ptr d, + const stmt_finder *custom_finder) { if (m_inner) return m_inner->warn (std::move (d)); @@ -978,6 +1004,14 @@ class region_model_context_decorator : public region_model_context return nullptr; } + const exploded_graph *get_eg () const override + { + if (m_inner) + return m_inner->get_eg (); + else + return nullptr; + } + protected: region_model_context_decorator (region_model_context *inner) : m_inner (inner) @@ -993,7 +1027,8 @@ protected: class annotating_context : public region_model_context_decorator { public: - bool warn (std::unique_ptr d) override + bool warn (std::unique_ptr d, + const stmt_finder *custom_finder) override { if (m_inner) if (m_inner->warn (std::move (d))) @@ -1158,7 +1193,8 @@ using namespace ::selftest; class test_region_model_context : public noop_region_model_context { public: - bool warn (std::unique_ptr d) final override + bool warn (std::unique_ptr d, + const stmt_finder *custom_finder) final override { m_diagnostics.safe_push (d.release ()); return true; diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c index 7cd72e8a886..b2caed8fc1b 100644 --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c @@ -44,6 +44,7 @@ #include "analyzer/region-model.h" #include "analyzer/call-details.h" #include "analyzer/call-info.h" +#include "analyzer/exploded-graph.h" #include "make-unique.h" int plugin_is_GPL_compatible; @@ -191,6 +192,372 @@ public: } }; +/* This is just a copy of leak_stmt_finder for now (subject to change if + * necssary) */ + +class refcnt_stmt_finder : public stmt_finder +{ +public: + refcnt_stmt_finder (const exploded_graph &eg, tree var) + : m_eg (eg), m_var (var) + { + } + + std::unique_ptr + clone () const final override + { + return make_unique (m_eg, m_var); + } + + const gimple * + find_stmt (const exploded_path &epath) final override + { + logger *const logger = m_eg.get_logger (); + LOG_FUNC (logger); + + if (m_var && TREE_CODE (m_var) == SSA_NAME) + { + /* Locate the final write to this SSA name in the path. */ + const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var); + + int idx_of_def_stmt; + bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt); + if (!found) + goto not_found; + + /* What was the next write to the underlying var + after the SSA name was set? (if any). */ + + for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length (); + ++idx) + { + const exploded_edge *eedge = epath.m_edges[idx]; + if (logger) + logger->log ("eedge[%i]: EN %i -> EN %i", idx, + eedge->m_src->m_index, + eedge->m_dest->m_index); + const exploded_node *dst_node = eedge->m_dest; + const program_point &dst_point = dst_node->get_point (); + const gimple *stmt = dst_point.get_stmt (); + if (!stmt) + continue; + if (const gassign *assign = dyn_cast (stmt)) + { + tree lhs = gimple_assign_lhs (assign); + if (TREE_CODE (lhs) == SSA_NAME + && SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var)) + return assign; + } + } + } + + not_found: + + /* Look backwards for the first statement with a location. */ + int i; + const exploded_edge *eedge; + FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge) + { + if (logger) + logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index, + eedge->m_dest->m_index); + const exploded_node *dst_node = eedge->m_dest; + const program_point &dst_point = dst_node->get_point (); + const gimple *stmt = dst_point.get_stmt (); + if (stmt) + if (get_pure_location (stmt->location) != UNKNOWN_LOCATION) + return stmt; + } + + gcc_unreachable (); + return NULL; + } + +private: + const exploded_graph &m_eg; + tree m_var; +}; + +class refcnt_mismatch : public pending_diagnostic_subclass +{ +public: + refcnt_mismatch (const region *base_region, + const svalue *ob_refcnt, + const svalue *actual_refcnt, + tree reg_tree) + : m_base_region (base_region), m_ob_refcnt (ob_refcnt), + m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree) + { + } + + const char * + get_kind () const final override + { + return "refcnt_mismatch"; + } + + bool + operator== (const refcnt_mismatch &other) const + { + return (m_base_region == other.m_base_region + && m_ob_refcnt == other.m_ob_refcnt + && m_actual_refcnt == other.m_actual_refcnt); + } + + int get_controlling_option () const final override + { + return 0; + } + + bool + emit (rich_location *rich_loc, logger *) final override + { + diagnostic_metadata m; + bool warned; + // just assuming constants for now + auto actual_refcnt + = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant (); + auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant (); + warned = warning_meta ( + rich_loc, m, get_controlling_option (), + "expected to have " + "reference count: %qE but ob_refcnt field is: %qE", + actual_refcnt, ob_refcnt); + + // location_t loc = rich_loc->get_loc (); + // foo (loc); + return warned; + } + + void mark_interesting_stuff (interesting_t *interest) final override + { + if (m_base_region) + interest->add_region_creation (m_base_region); + } + +private: + + void foo(location_t loc) const + { + inform(loc, "something is up right here"); + } + const region *m_base_region; + const svalue *m_ob_refcnt; + const svalue *m_actual_refcnt; + tree m_reg_tree; +}; + +/* Retrieves the svalue associated with the ob_refcnt field of the base region. + */ +static const svalue * +retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model, + region_model_context *ctxt) +{ + region_model_manager *mgr = model->get_manager (); + tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt"); + const region *ob_refcnt_region + = mgr->get_field_region (base_reg, ob_refcnt_tree); + const svalue *ob_refcnt_sval + = model->get_store_value (ob_refcnt_region, ctxt); + return ob_refcnt_sval; +} + +static void +increment_region_refcnt (hash_map &map, const region *key) +{ + bool existed; + auto &refcnt = map.get_or_insert (key, &existed); + refcnt = existed ? refcnt + 1 : 1; +} + + +/* Recursively fills in region_to_refcnt with the references owned by + pyobj_ptr_sval. */ +static void +count_expected_pyobj_references (const region_model *model, + hash_map ®ion_to_refcnt, + const svalue *pyobj_ptr_sval, + hash_set &seen) +{ + if (!pyobj_ptr_sval) + return; + + const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue (); + const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue (); + if (!pyobj_region_sval && !pyobj_initial_sval) + return; + + // todo: support initial sval (e.g passed in as parameter) + if (pyobj_initial_sval) + { + // increment_region_refcnt (region_to_refcnt, + // pyobj_initial_sval->get_region ()); + return; + } + + const region *pyobj_region = pyobj_region_sval->get_pointee (); + if (!pyobj_region || seen.contains (pyobj_region)) + return; + + seen.add (pyobj_region); + + if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree) + increment_region_refcnt (region_to_refcnt, pyobj_region); + + const auto *curr_store = model->get_store (); + const auto *retval_cluster = curr_store->get_cluster (pyobj_region); + if (!retval_cluster) + return; + + const auto &retval_binding_map = retval_cluster->get_map (); + + for (const auto &binding : retval_binding_map) + { + const svalue *binding_sval = binding.second; + const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable (); + const region *pointee = unwrapped_sval->maybe_get_region (); + + if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED) + count_expected_pyobj_references (model, region_to_refcnt, binding_sval, + seen); + } +} + +/* Compare ob_refcnt field vs the actual reference count of a region */ +static void +check_refcnt (const region_model *model, region_model_context *ctxt, + const hash_map::iterator::reference_pair region_refcnt) +{ + region_model_manager *mgr = model->get_manager (); + const auto &curr_region = region_refcnt.first; + const auto &actual_refcnt = region_refcnt.second; + const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt); + const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst ( + ob_refcnt_sval->get_type (), actual_refcnt); + + if (ob_refcnt_sval != actual_refcnt_sval) + { + // todo: fix this + tree reg_tree = model->get_representative_tree (curr_region); + + const auto &eg = ctxt->get_eg (); + refcnt_stmt_finder finder (*eg, reg_tree); + auto pd = make_unique (curr_region, ob_refcnt_sval, + actual_refcnt_sval, reg_tree); + if (pd && eg) + ctxt->warn (std::move (pd), &finder); + } +} + +static void +check_refcnts (const region_model *model, const svalue *retval, + region_model_context *ctxt, + hash_map ®ion_to_refcnt) +{ + for (const auto ®ion_refcnt : region_to_refcnt) + { + check_refcnt(model, ctxt, region_refcnt); + } +} + +/* Validates the reference count of all Python objects. */ +void +pyobj_refcnt_checker (const region_model *model, const svalue *retval, + region_model_context *ctxt) +{ + if (!ctxt) + return; + + auto region_to_refcnt = hash_map (); + auto seen_regions = hash_set (); + + count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions); + check_refcnts (model, retval, ctxt, region_to_refcnt); +} + +/* Counts the actual pyobject references from all clusters in the model's + * store. */ +static void +count_all_references (const region_model *model, + hash_map ®ion_to_refcnt) +{ + for (const auto &cluster : *model->get_store ()) + { + auto curr_region = cluster.first; + if (curr_region->get_kind () != RK_HEAP_ALLOCATED) + continue; + + increment_region_refcnt (region_to_refcnt, curr_region); + + auto binding_cluster = cluster.second; + for (const auto &binding : binding_cluster->get_map ()) + { + const svalue *binding_sval = binding.second; + + const svalue *unwrapped_sval + = binding_sval->unwrap_any_unmergeable (); + // if (unwrapped_sval->get_type () != pyobj_ptr_tree) + // continue; + + const region *pointee = unwrapped_sval->maybe_get_region (); + if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED) + continue; + + increment_region_refcnt (region_to_refcnt, pointee); + } + } +} + +static void +dump_refcnt_info (const hash_map ®ion_to_refcnt, + const region_model *model, region_model_context *ctxt) +{ + region_model_manager *mgr = model->get_manager (); + pretty_printer pp; + pp_format_decoder (&pp) = default_tree_printer; + pp_show_color (&pp) = pp_show_color (global_dc->printer); + pp.buffer->stream = stderr; + + for (const auto ®ion_refcnt : region_to_refcnt) + { + auto region = region_refcnt.first; + auto actual_refcnt = region_refcnt.second; + const svalue *ob_refcnt_sval + = retrieve_ob_refcnt_sval (region, model, ctxt); + const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst ( + ob_refcnt_sval->get_type (), actual_refcnt); + + region->dump_to_pp (&pp, true); + pp_string (&pp, " — ob_refcnt: "); + ob_refcnt_sval->dump_to_pp (&pp, true); + pp_string (&pp, " actual refcnt: "); + actual_refcnt_sval->dump_to_pp (&pp, true); + pp_newline (&pp); + } + pp_string (&pp, "~~~~~~~~\n"); + pp_flush (&pp); +} + +class kf_analyzer_cpython_dump_refcounts : public known_function +{ +public: + bool matches_call_types_p (const call_details &cd) const final override + { + return cd.num_args () == 0; + } + void impl_call_pre (const call_details &cd) const final override + { + region_model_context *ctxt = cd.get_ctxt (); + if (!ctxt) + return; + region_model *model = cd.get_model (); + auto region_to_refcnt = hash_map (); + count_all_references(model, region_to_refcnt); + dump_refcnt_info(region_to_refcnt, model, ctxt); + } +}; + /* Some concessions were made to simplify the analysis process when comparing kf_PyList_Append with the real implementation. In particular, PyList_Append performs some @@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */) iface->register_known_function ("PyList_New", make_unique ()); iface->register_known_function ("PyLong_FromLong", make_unique ()); + + iface->register_known_function ( + "__analyzer_cpython_dump_refcounts", + make_unique ()); } } // namespace ana @@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info, const char *plugin_name = plugin_info->base_name; if (0) inform (input_location, "got here; %qs", plugin_name); - ana::register_finish_translation_unit_callback (&stash_named_types); - ana::register_finish_translation_unit_callback (&stash_global_vars); + register_finish_translation_unit_callback (&stash_named_types); + register_finish_translation_unit_callback (&stash_global_vars); + region_model::register_pop_frame_callback(pyobj_refcnt_checker); register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT, ana::cpython_analyzer_init_cb, NULL); /* void *user_data */ diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c similarity index 64% rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c index 19b5c17428a..9912f9105d4 100644 --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c @@ -8,34 +8,6 @@ #include #include "../analyzer/analyzer-decls.h" -PyObject * -test_PyList_New (Py_ssize_t len) -{ - PyObject *obj = PyList_New (len); - if (obj) - { - __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ - __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */ - } - else - __analyzer_dump_path (); /* { dg-message "path" } */ - return obj; -} - -PyObject * -test_PyLong_New (long n) -{ - PyObject *obj = PyLong_FromLong (n); - if (obj) - { - __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ - __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */ - } - else - __analyzer_dump_path (); /* { dg-message "path" } */ - return obj; -} - PyObject * test_PyListAppend (long n) { @@ -43,6 +15,7 @@ test_PyListAppend (long n) PyObject *list = PyList_New (0); PyList_Append(list, item); return list; /* { dg-warning "leak of 'item'" } */ + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ } PyObject * @@ -67,6 +40,7 @@ test_PyListAppend_2 (long n) else __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */ return list; /* { dg-warning "leak of 'item'" } */ + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ } @@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list) { PyList_Append (list, item); return list; +} + +PyObject * +test_PyListAppend_4 (long n) +{ + PyObject *item = PyLong_FromLong (n); + PyObject *list = NULL; + PyList_Append(list, item); + return list; +} + +PyObject * +test_PyListAppend_5 () +{ + PyObject *list = PyList_New (0); + PyList_Append(list, NULL); + return list; +} + +PyObject * +test_PyListAppend_6 () +{ + PyObject *item = NULL; + PyObject *list = NULL; + PyList_Append(list, item); + return list; } \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c new file mode 100644 index 00000000000..492d4f7d58d --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target analyzer } */ +/* { dg-options "-fanalyzer" } */ +/* { dg-require-python-h "" } */ + + +#define PY_SSIZE_T_CLEAN +#include +#include "../analyzer/analyzer-decls.h" + +PyObject * +test_PyList_New (Py_ssize_t len) +{ + PyObject *obj = PyList_New (len); + if (obj) + { + __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */ + } + else + __analyzer_dump_path (); /* { dg-message "path" } */ + return obj; +} + +void +test_PyList_New_2 () +{ + PyObject *obj = PyList_New (0); +} /* { dg-warning "leak of 'obj'" } */ + +PyObject *test_stray_incref_PyList () +{ + PyObject *p = PyList_New (2); + if (p) + Py_INCREF (p); + return p; + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ +} \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c new file mode 100644 index 00000000000..97b29849302 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target analyzer } */ +/* { dg-options "-fanalyzer" } */ +/* { dg-require-python-h "" } */ + + +#define PY_SSIZE_T_CLEAN +#include +#include "../analyzer/analyzer-decls.h" + +PyObject * +test_PyLong_New (long n) +{ + PyObject *obj = PyLong_FromLong (n); + if (obj) + { + __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */ + } + else + __analyzer_dump_path (); /* { dg-message "path" } */ + return obj; +} + +void +test_PyLong_New_2 (long n) +{ + PyObject *obj = PyLong_FromLong (n); +} /* { dg-warning "leak of 'obj'" } */ + +PyObject *test_stray_incref_PyLong (long val) +{ + PyObject *p = PyLong_FromLong (val); + if (p) + Py_INCREF (p); + return p; + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ +} \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c similarity index 100% rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c new file mode 100644 index 00000000000..9912f9105d4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c @@ -0,0 +1,78 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target analyzer } */ +/* { dg-options "-fanalyzer" } */ +/* { dg-require-python-h "" } */ + + +#define PY_SSIZE_T_CLEAN +#include +#include "../analyzer/analyzer-decls.h" + +PyObject * +test_PyListAppend (long n) +{ + PyObject *item = PyLong_FromLong (n); + PyObject *list = PyList_New (0); + PyList_Append(list, item); + return list; /* { dg-warning "leak of 'item'" } */ + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ +} + +PyObject * +test_PyListAppend_2 (long n) +{ + PyObject *item = PyLong_FromLong (n); + if (!item) + return NULL; + + __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + PyObject *list = PyList_New (n); + if (!list) + { + Py_DECREF(item); + return NULL; + } + + __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + + if (PyList_Append (list, item) < 0) + __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + else + __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */ + return list; /* { dg-warning "leak of 'item'" } */ + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ +} + + +PyObject * +test_PyListAppend_3 (PyObject *item, PyObject *list) +{ + PyList_Append (list, item); + return list; +} + +PyObject * +test_PyListAppend_4 (long n) +{ + PyObject *item = PyLong_FromLong (n); + PyObject *list = NULL; + PyList_Append(list, item); + return list; +} + +PyObject * +test_PyListAppend_5 () +{ + PyObject *list = PyList_New (0); + PyList_Append(list, NULL); + return list; +} + +PyObject * +test_PyListAppend_6 () +{ + PyObject *item = NULL; + PyObject *list = NULL; + PyList_Append(list, item); + return list; +} \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp index e1ed2d2589e..cbef6da8d86 100644 --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp @@ -161,8 +161,9 @@ set plugin_test_list [list \ taint-CVE-2011-0521-6.c \ taint-antipatterns-1.c } \ { analyzer_cpython_plugin.c \ - cpython-plugin-test-1.c \ - cpython-plugin-test-2.c } \ + cpython-plugin-test-PyList_Append.c \ + cpython-plugin-test-PyList_New.c \ + cpython-plugin-test-PyLong_FromLong.c } \ ] foreach plugin_test $plugin_test_list {